Introduction

The very nature of the global landscape in the 21st century can be characterized as one of uncertainty and unpredictability. Through the rise of globalization, a process compelled by investments and international trade, facilitated through the increase use of information technology, societies are becoming better integrated and more interconnected. However, one thing is certain: as globalization continues, strife amongst societies will persist. Our adversaries will continue to seek environment dominance and create an asymmetrical advantage by establishing regional hegemony to disrupt U.S. sanctuary at home or abroad. Bahrain is no exception as actors, both state and non-state, have sought to establish strategic control and influence over their population since 2011. These acts have resulted in violent and nonviolent protests occurring on a near daily basis within the kingdom. Therefore, developing a model to predict populace behavior is vital to U.S. strategic objectives.

Scenario

A team of analysts, LT Angeles, Capt Biles, and CPT Wren are assigned to 5th Fleet/US Navy Central Command (NAVCENT) located at NSA Bahrain. 5th Fleet has asked its analyst team to model the civil unrest within the island using open source data.

Collecting raw civil data within the Political, Military, Economic, Social, Information, Infrastructure, Physical Environment, and Time (PMESII-PT) variables allows the 5th Fleet analyst team to follow a commonly understood framework. By applying data mining techniques across open source websites, daily data beginning on January 1st, 2016, through December 31st, 2019, from across 191 PMESII-PT independent variables was gathered and collated. This time period resulted in a sample size, n, of 1461. By collecting data across PMESII-PT variables, one is able to develop a baseline understanding of an operational environment that is comprehended universally by military leaders. It is important to note that each operational environment is unique and that while some PMESII-PT variables are applicable, each environment requires its own data collection plan. While individual variables do not dominate every operational environment, they are all present and require careful consideration; ignoring one or more of these variables can negatively impact military missions and degrade a commander’s understanding of the operational environment.

Codebook

The predictors in master_df.csv utilized Political-Military-Economic-Social-Information-Infrastructure-Physical_Environment-Time (PMESII-PT) as a framework for data collection.

  1. Political
  1. Military
  1. Economic
  1. No daily social or Infrastructure data collected as they are difficult to measure categorically and even harder to quantify.

  2. Information

  1. Physical Environment
  1. Time

Our various target variables, 2016-01-01-2019-12-31-Bahrain.csv, are provided by The Armed Conflict Location & Event Data (ACLED) Project, which “is a disaggregated data collection, analysis, and crisis mapping project. ACLED collects the dates, actors, locations, fatalities, and modalities of all reported political violence and protest events across Africa, South Asia, Southeast Asia, the Middle East, Central Asia, and the Caucasus, Latin America and the Caribbean, and Southeastern and Eastern Europe and the Balkans. The ACLED team conducts analysis to describe, explore, and test conflict scenarios, and makes both data and analysis open for free use by the public.” Source: https://acleddata.com/about-acled/

For the initial stages of this analysis, the team will focus its efforts to predict the total number of riots, which is an aggregated sum of mob violence and violent demonstrations. Further analysis below conducts dimensionality reduction and attempting to predict that new target variable.

Exploratory Data Analysis (EDA): Part 1

The ACLED data set provided several different event and sub-event categories for study and analysis. (Notionally) 5th Fleet is interested in any event that could potentially cause a disruption in base operations. Analyzing the data can enable the team to identify patterns, discover anomalies, and generate hypotheses. Analyzing the raw data led the team to categorize social media posts, month, and weekday as discrete variables and holiday as binary. The team elects to conduct a more thorough EDA once it selects an event to model.

# Bar chart presenting event types over time along with their count
plot1 <- ggplot(by_date_type, aes(fill=event_type, y=count, x=event_date)) + 
    geom_bar(position="stack", stat="identity") + scale_x_date(date_breaks='year', date_labels='%y') + 
  labs(title='Bahrain Events (01-JAN-16 to 31-DEC-19)', x='Date', y='Count') +
  theme(legend.title=element_blank(), legend.position='right', legend.text=element_text(size=9))
  
ggplotly(plot1, tooltip=c('y', 'x', 'event_type')) 
16171819200204060
BattlesExplosions/Remote violenceProtestsRiotsStrategic developmentsViolence against civiliansBahrain Events (01-JAN-16 to 31-DEC-19)DateCount

From Plot1 we can see that protests make up the majority of our events. However, these protests can be divided into peaceful or violent and this warrants further analysis. Although violent and nonviolent protests occur on a near daily basis in the Kingdom of Bahrain, these protests are localized to predominately Shi’ite neighborhoods such as Diraz, Sitra, and Ma’ameer. Despite being predominantly Shi’ite districts, these neighborhoods present distinct social, economic, and political challenges when identifying and understanding the civil domain.

# Bar chart presenting sub event types over time along with their count
plot2 <- ggplot(by_date_type, aes(fill=sub_event_type, y=count, x=event_date)) +
  geom_bar(position="stack", stat="identity") + scale_x_date(date_breaks='years', date_labels='%y') +
  theme(legend.title=element_blank(), legend.position='bottom', legend.text=element_text(size=9)) + 
  labs(title='Bahrain Sub-events (01-JAN-16 to 31-DEC-19)', x='Date', y='Count')
  
ggplotly(plot2, tooltip=c('y', 'x', 'sub_event_type')) 
16171819200204060
AgreementArmed clashArrestsAttackChange to group/activityDisrupted weapons useExcessive force against protestersGrenadeLooting/property destructionMob violenceOtherPeaceful protestProtest with interventionRemote explosive/landmine/IEDSexual violenceShelling/artillery/missile attackViolent demonstrationBahrain Sub-events (01-JAN-16 to 31-DEC-19)DateCount

From Plot2 we can see that the majority of events are peaceful demonstrations. However, this conclusion leads to further discussion about peaceful protests turning violent or action caused by the government against civilians. The code analyzing this phenomenon is further below (EDA Question #3).

# GG plot for riots during our time frame
plot4 <- master_df %>%
  ggplot( aes(x=event_date, y=riot_count)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    labs(title='Bahrain Riots (01-JAN-16 to 31-DEC-19)', x='Date', y='Count') + 
  scale_x_date(date_breaks='years', date_labels='%Y')
    
ggplotly(plot4, tooltip=c('y', 'x', 'event_type'))
201620172018201920200102030
Bahrain Riots (01-JAN-16 to 31-DEC-19)DateCount

In Plot4 we can see that although violent demonstrations occur nearly every day, there appears to be a cyclical nature for high volume events. Terrorists and insurgents often use a cycle to carry out their attacks. According to the Director of National Intelligence, the terrorism cycle roughly consists of re-attack surveillance, training, and rehearsals. Conversely, the spike in protests may be in response to government actions against civilians or simply pure coincidence. In the R chunk below we will explore the seasonality nature of these demonstrations (EDA Question #1). A time series is said to be stationary if it holds the following conditions true:

  1. The mean value of time-series is constant over time, which implies, the trend component is nullified.
  2. The variance does not increase over time.
  3. Seasonality effect is minimal. Source: http://r-statistics.co/Time-Series-Analysis-With-R.html

In order to determine seasonality, we will utilize the Webel and Ollech (2019) test, which is an overall test for seasonality of a given time series. Source: https://cran.r-project.org/web/packages/seastests/seastests.pdf

# EDA Question #1: Is there evidence of a terrorist/insurgent attack cycle via seasonality analysis?

# Code sourced from https://cran.r-project.org/web/packages/seastests/vignettes/seastests-vignette.html

# Creates a dataframe for conversion into a time series object
tsData <- cbind(ts(master_df[, 1]), master_df[, 223])

# Creates a time series object
ts_obj <- ts_ts(ts_long(tsData))

# Performs a simple Webel and Ollech seasonality test
isSeasonal(ts_obj, test = 'wo', freq = 7)

# Checks for seasonality within a month
# Create a sequence of days from 1 to 30
day_seq <- seq(1,30, 1)
i <- 1 # Initialize a counter

# Create empty data containers for storage
kw_df <- data.frame() 

# Iterate over the length of 30 days
while(i <= length(day_seq)){
  # store the results of the Kruskal Wallace test
  kw_df[i,1] <- kw(ts_obj, freq = i, diff = T, residuals = F, autoarima = T)$Pval
  kw_df[i,2] <- i
  
  i = i + 1 # Increment the counter
}

# Change column names
names(kw_df)[1] <- "P-value"
names(kw_df)[2] <- "Day"

# Produce the trend and remainder data.
forecast::mstl(master_df$riot_count)
## [1] "Day  1  has p-value 1"
## [1] "Day  2  has p-value 0.568633129024278"
## [1] "Day  3  has p-value 0.0895051025727108"
## [1] "Day  4  has p-value 0.874688632633448"
## [1] "Day  5  has p-value 0.031469940685127"
## [1] "Day  6  has p-value 0.167023568005817"
## [1] "Day  7  has p-value 0.391981114540598"
## [1] "Day  8  has p-value 0.621282101342723"
## [1] "Day  9  has p-value 0.510489110739845"
## [1] "Day  10  has p-value 0.0849644787543207"
## [1] "Day  11  has p-value 0.159815642799602"
## [1] "Day  12  has p-value 0.516291197778634"
## [1] "Day  13  has p-value 0.0445863214864214"
## [1] "Day  14  has p-value 0.56186090500358"
## [1] "Day  15  has p-value 0.196047602054947"
## [1] "Day  16  has p-value 0.0558027515517882"
## [1] "Day  17  has p-value 0.953299557161738"
## [1] "Day  18  has p-value 0.657759469469912"
## [1] "Day  19  has p-value 0.378992512650113"
## [1] "Day  20  has p-value 0.212477444683822"
## [1] "Day  21  has p-value 0.818646678140778"
## [1] "Day  22  has p-value 0.23977688981302"
## [1] "Day  23  has p-value 0.570585359465244"
## [1] "Day  24  has p-value 0.308238021403119"
## [1] "Day  25  has p-value 0.123195116716367"
## [1] "Day  26  has p-value 0.0594705662306877"
## [1] "Day  27  has p-value 0.742004623410293"
## [1] "Day  28  has p-value 0.16403076361633"
## [1] "Day  29  has p-value 0.878928910906065"
## [1] "Day  30  has p-value 0.379182639938412"
## [1] "The estimated cycle is 5 days"

By utilizing the Kruskal-Wallace Test with 30 groups, we are able to determine if there are differences between two or more groups of an independent variable on a continuous or ordinal dependent variable (source: https://statistics.laerd.com/spss-tutorials/kruskal-wallis-h-test-using-spss-statistics.php). From the output above, we are able to conclude that there is sufficient evidence that riots occur on a 5 or 13-day cycle due to their low p-value. Further analysis of Plot5 contributes to this conclusion as it’s trending mean is approximately zero with an almost constant variance. As these attacks appear to be seasonal, the team lacks sufficient evidence to pinpoint the exact duration of the insurgent planning cycle.

Speculation amongst the analyst team at 5th Fleet (notional) believes that these protests are localized within predominately Shi’a neighborhoods who believe they are oppressed by the government. Identifying the towns with the most disruptions will enable effective force protection measures and ensuring minimal risk to the base.

EDA Question #2: Where are the protests occurring and what is the subsequent risk to the base and personnel?

# EDA Question #2: Where are the protests occurring and what is the risk to the base?

# Aggregate protests by location and exclude non-violent protests
by_location_type <- raw_df %>% group_by(location, sub_event_type != 'Peaceful protest') %>% dplyr::summarise(count = n())

# Change column name to Boolean_Violence
names(by_location_type)[2] <- "Boolean_Violence"

# Filter data that only accounts for violent protests only
filtered_location_df <- data.frame(by_location_type[by_location_type$Boolean_Violence == TRUE,])

# Sort the dataframe above with the highest number of events
filtered_location_df <- head(filtered_location_df[order(filtered_location_df$count, decreasing = TRUE),],10)

# Top 10 locations for violent/nonviolent protests
filtered_location_df['longitude'] <- NA
filtered_location_df['latitude'] <- NA
violent_locs <- filtered_location_df %>% pull(location)
for (loc in violent_locs) {
  filtered_location_df[filtered_location_df$location==loc,]['longitude'] <- raw_df[raw_df$location==loc,]$longitude[1]
  filtered_location_df[filtered_location_df$location==loc,]['latitude'] <- raw_df[raw_df$location==loc,]$latitude[1]
}

# All locations for violent/nonviolent protests
by_loc_type_df <- as.data.frame(by_location_type)
unique_locs <- by_loc_type_df %>% pull(location)
unique_locs <- unique(unique_locs)
by_loc_type_df['longitude'] <- NA
by_loc_type_df['latitude'] <- NA
for (loc in unique_locs) {
  by_loc_type_df[by_loc_type_df$location==loc,]['longitude'] <- raw_df[raw_df$location==loc,]$longitude[1]
  by_loc_type_df[by_loc_type_df$location==loc,]['latitude'] <- raw_df[raw_df$location==loc,]$latitude[1]
}

# Bar plot for top 10 violent demonstrations
plot6 <- ggplot(filtered_location_df, aes(x=reorder(location, -count), y=count, fill=location)) + geom_bar(stat='identity') + labs(title='Top 10 Violent Demonstrations (01-JAN-16 to 31-DEC-19)', x='Location', y='Count') + geom_text(aes(label=count), vjust=1.6, color="white", size=3) + scale_fill_brewer(palette='Paired') + theme(legend.position='none', axis.text.x=element_text(angle=45, hjust = 1))

plot6

Sys.setenv('MAPBOX_TOKEN' = 'pk.eyJ1IjoiZGF2aWR3cmVuIiwiYSI6ImNrYWxqNG8xZTB0bW0ycmxiNGI2NW9jenAifQ.XspCPRtcou2DCia28IF3og')

# Bubble map for where violent demonstrations are occurring
plot7 <- plot_mapbox(maps::canada.cities) %>%
  add_markers(data= by_loc_type_df[by_loc_type_df$Boolean_Violence==TRUE,],
    x = ~longitude,
    y = ~latitude, split=~location,
    size=~count,
    color = ~count,
    text=~paste('Count:', count)) %>%
  layout(
    showlegend=FALSE, title='Bahrain Riots (01-JAN-16 to 31-DEC-19)',
    mapbox=list(center=list(lat=median(raw_df$latitude), lon=median(raw_df$longitude)), zoom=9))

plot7
# Bubble map for where non-violent demonstrations are occurring
plot8 <- plot_mapbox(maps::canada.cities) %>%
  add_trace(data= by_loc_type_df[by_loc_type_df$Boolean_Violence==FALSE,], type='scattergeo',
    x = ~longitude,
    y = ~latitude, split=~location,
    size=~count,
    color = ~count,
    text=~paste('Count:', count)) %>%
  layout(
    showlegend=FALSE, title='Non-violent Demonstrations (01-JAN-17 to 31-DEC-19)',
    mapbox=list(center=list(lat=median(raw_df$latitude), lon=median(raw_df$longitude)), zoom=9))

plot8

The bar graph compares the number of riots of the top ten cities in Bahrain that have the highest count; Nuwaidrat experienced the most violent demonstrations at 387. From the bubble map of violent demonstrations, we see that not very many protests actually occur around NSA Bahrain; there were only 91 violent protests northwest of NSA Bahrain in Juffair. Most of the protests occurred in the south-southwest region of Bahrain, largely focused on the cities of Sitra and Ma’ameer. Conversely, large numbers of peaceful protests are more spread out through the country. While Ma’ameer still experienced many peaceful protests, cities like Budaiya, Jid Hafs, and A’ali have also experienced a large amount. Additionally, we see that not very many peaceful protests occurred around NSA Bahrain, with the closest city being Juffair at 25 protests.

# EDA Question #3: What is the relationship between nonviolent protests and violent demonstrations?

# This code determines the correlation between peaceful protests and violent demonstrations
cor.test(master_df$riot_count, master_df$Peaceful_protest_count)
## 
##  Pearson's product-moment correlation
## 
## data:  master_df$riot_count and master_df$Peaceful_protest_count
## t = 21.132, df = 1459, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4438215 0.5224032
## sample estimates:
##       cor 
## 0.4840878
# Create a dataframe for correlation plotting
corr_df <- data.frame(master_df$riot_count, master_df$Peaceful_protest_count)

# Change column names
names(corr_df)[1] <- "Riots"
names(corr_df)[2] <- "Peaceful_Protests"

# Code sourced from: http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r
# Create a plot
plot9 <- ggscatter(corr_df, x = "Riots", y = "Peaceful_Protests", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Violent Demonstration Count", ylab = "Peaceful Protest Count", title = 'Violent and Peaceful Demonstrations')

plot9
## `geom_smooth()` using formula 'y ~ x'

# Shapiro-Wilk normality test for peaceful protests
shapiro.test(corr_df$Peaceful_Protests)
## 
##  Shapiro-Wilk normality test
## 
## data:  corr_df$Peaceful_Protests
## W = 0.72153, p-value < 2.2e-16
# Shapiro-Wilk normality test for violent demonstrations
shapiro.test(corr_df$Riots)
## 
##  Shapiro-Wilk normality test
## 
## data:  corr_df$Riots
## W = 0.71688, p-value < 2.2e-16

We see the correlation calculated above is 0.4840878, which is not above the widely accepted standard of 0.7 or -0.7. However, given the nature of the events measured, we see that the correlation coefficient is insightful: not every non-violent protest will become violent. Human actions from both parties and other external influences can either escalate or de-escalate a non-violent protest.

The team scrutinized a subset of the data and looked at individual events and at this point believes that non-violent events and violent demonstrations are not double-counted implying they are IID. Furthermore, the team believes that both peaceful protests and violent demonstrations are both normal via the Shapiro-Wilk normality test. As a result, the team believes that incorporating non-violent events may serve as a useful predictor variable.

From the data above we can see that battles and attacks do occur between Ministry of Interior (MOI) forces and the civilian population. The late Martin Luther King Jr stated that “violence begets violence.” Research into this topic leads the team to its fourth EDA question: Does violence beget violence?

# EDA Question 4: Does violence beget violence?

# Compute the correlation between violence against civilians and violent demonstrations
cor.test(master_df$riot_count, master_df$VAC_count)
## 
##  Pearson's product-moment correlation
## 
## data:  master_df$riot_count and master_df$VAC_count
## t = 0.97831, df = 1459, p-value = 0.3281
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02571446  0.07678795
## sample estimates:
##        cor 
## 0.02560404

From the correlation coefficient, we can see there is no discernible relationship between violence against civilians and violent demonstrations on the same day. However, additional analysis will need to occur to determine whether or not violence against civilians serves has an impact in the coming days as news of an attack would spread amongst the population and incite emotional responses.

EDA: Part 2

# Determine elementary statistical values of the column violent demonstrations
mean_of_df <- mean(master_df$riot_count)
median_of_df <- median(master_df$riot_count)
var_of_df <- var(master_df$riot_count)
sd_of_df <- sqrt(master_df$riot_count)

# Generate a boxplot of violent demonstrations per day
plot12 <- boxplot(riot_count~Numeric_Weekday,data=master_df, main="Riots During the Week", xlab="Weekday", ylab="Count", axes = FALSE); par(las=2); axis(1, at= 1:7, labels= c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')); axis(2)

plot12
## $stats
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    0    0    0    0    0    0    0
## [2,]    0    0    0    0    1    1    1
## [3,]    1    1    1    1    2    2    2
## [4,]    3    2    3    3    3    3    3
## [5,]    7    5    7    7    6    6    6
## attr(,"class")
##         1 
## "integer" 
## 
## $n
## [1] 209 209 209 208 208 209 209
## 
## $conf
##           [,1]      [,2]      [,3]      [,4]     [,5]     [,6]     [,7]
## [1,] 0.6721273 0.7814182 0.6721273 0.6713401 1.780893 1.781418 1.781418
## [2,] 1.3278727 1.2185818 1.3278727 1.3286599 2.219107 2.218582 2.218582
## 
## $out
##  [1]  8  8 11 27 12 18 12  9 10  9  9  8 14  8 12  7 13  9 12  9  7  6 10  6  7
## [26]  7 12 10 14 29 15 10 16 11  8 10 14  9 20  9  7 10  8 12 14  7 12  9  9 20
## [51]  7 10  9 12  9  8  9  9  9  7  9  8  8  7  7 13  9 10  8 11  9 10 11  9  9
## [76]  7 13 10 18 13  7 13
## 
## $group
##  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4
## [39] 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7
## [77] 7 7 7 7 7 7
## 
## $names
## [1] "1" "2" "3" "4" "5" "6" "7"
# Generate a boxplot of violent demonstrations per month
plot13 <- boxplot(riot_count~Month,data=master_df, main="Riots per Month", xlab="Month", ylab="Count", axes = FALSE); par(las=2); axis(1, at= 1:11, labels= month.abb[1:11]); axis(2)

plot13
## $stats
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]  0.0    0    0    0    0    0    0  0.0    0     0     0     0
## [2,]  1.0    1    1    1    1    1    0  0.5    0     0     0     0
## [3,]  2.5    2    2    2    2    2    1  1.0    1     1     1     2
## [4,]  5.0    4    3    3    3    3    3  3.0    2     2     2     3
## [5,] 11.0    8    6    6    6    6    7  6.0    5     5     5     7
## attr(,"class")
##         1 
## "integer" 
## 
## $n
##  [1] 124 113 124 120 124 120 124 124 120 124 120 124
## 
## $conf
##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]      [,7]      [,8]
## [1,] 1.932447 1.554098 1.716224 1.711533 1.716224 1.711533 0.5743354 0.6452795
## [2,] 3.067553 2.445902 2.283776 2.288467 2.283776 2.288467 1.4256646 1.3547205
##           [,9]     [,10]     [,11]    [,12]
## [1,] 0.7115328 0.7162236 0.7115328 1.574335
## [2,] 1.2884672 1.2837764 1.2884672 2.425665
## 
## $out
##  [1] 27 13 14 12 12 12 12 13 14  9 29  9 12 20  9 20 14  7  8 10  7 18 11  8  9
## [26] 12  7  9 11 18 13 13  9  9  7 11  9  9 12 15 10 12  9  9  9  9  7  7 10 14
## [51]  9  9  8 13 10 10  7  8  7  7  7  8  7  9  7  6  6  7  6  6  7  9 16 10
## 
## $group
##  [1]  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4
## [26]  4  4  4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  6  6  6  6  6  6
## [51]  6  6  7  7  7  7  8  8  8  8  8  9  9  9 10 10 10 11 11 11 11 12 12 12
## 
## $names
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"
# Generate a boxplot of violent demonstrations during holidays.  1 being yes, it is a holiday.
plot14 <- boxplot(riot_count~Holiday,data=master_df, main="Riots on Holidays", xlab="Holiday?", ylab="Count", axes = FALSE); par(las=2); axis(1, at= 1:2, labels=c("No", "Yes")); axis(2)

plot14
## $stats
##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0
## [3,]    1    1
## [4,]    3    3
## [5,]    7    7
## attr(,"class")
##         0 
## "integer" 
## 
## $n
## [1] 1400   61
## 
## $conf
##           [,1]      [,2]
## [1,] 0.8733182 0.3931052
## [2,] 1.1266818 1.6068948
## 
## $out
##  [1] 10  8  9  8 10 10  8  8 14 11  8  9 12 11  8 16 10 27 13 14 11 12 12  9 12
## [26] 14  9 29  8  9 12  8  9 11 18  9  9 12 15 10 12  9  9  9  9 10 14  9  8  8
## [51]  9 12 13 20  9 10 18 13 13  9  9  9 20 13 10 10  9 10
## 
## $group
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
## 
## $names
## [1] "0" "1"
# Plots temperature versus protests
plot15 <- master_df %>%
  ggplot( aes(x=TEMP, y=riot_count)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    labs(title='Bahrain Riots (01-JAN-16 to 31-DEC-19)', x='TEMP', y='Count')

ggplotly(plot15, tooltip=c('y', 'x', 'event_type')) 
5060708090100010203040
Bahrain Riots (01-JAN-16 to 31-DEC-19)TEMPCount

Based upon the plots above, we notice several trends. Holidays and days of the week tend to not have an effect on the number of riots. From the monthly box plot we can see a downward trend from the beginning of the year to the end. Upon inspection of the temperature data, there are more naturally more riots when the temperature is more suitable for action but the plot appears to be uniform in nature.

Once our exploratory data analysis is complete, we begin model formulation. Initially, the team utilizes Reg Subsets to determine the optimal variable selection for predicting ‘riot_count.’ However, the formulations below assume that the daily data is independent, which is often a poor assumption.

Non-Time Series Models

1. Reg Subsets

# id: model id
# object: regsubsets object
# data: data used to fit regsubsets
# outcome: outcome variable
get_model_formula <- function(id, object, outcome){
  # get models data
  models <- summary(object)$which[id,-1]
  # Get outcome variable
  form <- as.formula(object$call[[2]])
  outcome <- all.vars(form)[1]
  # Get model predictors
  predictors <- names(which(models == TRUE))
}
# Remove event_date to simplify 
master_df$event_date <- NULL

# Determine which coefficients produce the most accurate model
regfit.backward.full <- regsubsets(riot_count ~ . -(Armed_clash_count + Armed_clash_fatalities + battle_count + battle_fatalities + SAMA_count + SAMA_fatalities + RELIED_count + RELIED_fatalities + Grenade_count + Grenade_fatalities + explosion_count + explosions_fatalities + Sexual_violence_count + sexual_violence_fatalities + Attack_count + Attack_fatalities + VAC_count + VAC_fatalities + EFAP_count + EFAP_fatalities + PWI_count + PWI_fatalities + Peaceful_protest_count + Peaceful_protest_fatalities + protest_count + protest_fatalities + mob_violence_count + mob_violence_fatalities + violent_demonstration_count + violent_demonstration_fatalities + riot_fatalities), data = master_df, nvmax = 100, really.big =  TRUE, method = 'backward')

# Produce a summary of the full model
res.backward.sum <- summary(regfit.backward.full)

# Determine which model has the highest R-squared
backward.adjr2 <- which.max(res.backward.sum$adjr2)

# Determine which model has the lowest Mallow's CP
backward.cp <- which.min(res.backward.sum$cp)

# Determine which model has the lowest BIC
backward.bic <- which.min(res.backward.sum$bic)
## [1] "The backwards model with the highest R-squared is 88 with a value of 0.6578"
## [1] "The backwards model with the lowest Mallow's CP is 49 with a value of -15.9535"
## [1] "The backwards model with the lowest BIC is 13 with a value of -1370.2913"
## Backwards model

# Determines which features we should select according to our three measures of effectiveness: R-squared, Mallow's CP, and BIC
get_model_formula(backward.adjr2, regfit.backward.full, "riot_count")
get_model_formula(backward.cp, regfit.backward.full, "riot_count")
get_model_formula(backward.bic, regfit.backward.full, "riot_count")
# Determine which coefficients produce the most accurate model
regfit.forward.full <- regsubsets(riot_count ~ . -(Armed_clash_count + Armed_clash_fatalities + battle_count + battle_fatalities + SAMA_count + SAMA_fatalities + RELIED_count + RELIED_fatalities + Grenade_count + Grenade_fatalities + explosion_count + explosions_fatalities + Sexual_violence_count + sexual_violence_fatalities + Attack_count + Attack_fatalities + VAC_count + VAC_fatalities + EFAP_count + EFAP_fatalities + PWI_count + PWI_fatalities + Peaceful_protest_count + Peaceful_protest_fatalities + protest_count + protest_fatalities + mob_violence_count + mob_violence_fatalities + violent_demonstration_count + violent_demonstration_fatalities + riot_fatalities), data = master_df, nvmax = 100, really.big =  TRUE, method = 'forward')

# Produce a summary of the full model
res.forward.sum <- summary(regfit.forward.full)

# Determine which model has the highest R-squared
forward.adjr2 <- which.max(res.forward.sum$adjr2)

# Determine which model has the lowest Mallow's CP
forward.cp <- which.min(res.forward.sum$cp)

# Determine which model has the lowest BIC
forward.bic <- which.min(res.forward.sum$bic)
## [1] "The forwards model with the highest R-squared is 89 with a value of 0.6555"
## [1] "The forwards model with the lowest Mallow's CP is 60 with a value of -8.2668"
## [1] "The forwards model with the lowest BIC is 9 with a value of -1358.205"
## Forwards model

# Determines which features we should select according to our three measures of effectiveness: R-squared, Mallow's CP, and BIC
get_model_formula(forward.adjr2, regfit.forward.full, "riot_count")
get_model_formula(forward.cp, regfit.forward.full, "riot_count")
get_model_formula(forward.bic, regfit.forward.full, "riot_count")

Utilizing the Reg Subsets, we are able to see that assess our three measures of effectiveness for both forward and backwards models: R-squared, Mallow’s CP, and BIC. Despite their relatively undesired performance measures, these models provide coefficients that serve as a starting point for model formulation.

AdaBoost is an ensemble methods for classification. We will use AdaBoost with riot count as the target variable and our coefficients as determined by the backward.adjr2 model from Reg Subsets.

2. ADABOOST

# Set riot_count as categorical for processing
master_df$riot_count <- as.factor(master_df$riot_count)

# Formulate the model using backward.adjr2 from Reg Subsets
cvmodel = boosting.cv(riot_count ~ al_wafa_Total_Posts + al_wafa_Favorites + Alwatan_Live_Retweets + 
    bahrain_moi_Favorites + bahrain_moi_Retweets + bh14feb2011_Retweets + 
    bna_ar_Total_Posts + bna_ar_Favorites + Coalition14_Total_Posts + 
    Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Favorites + 
    duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites + 
    feb14revolution_Retweets + Iran_Total_Posts + Iran_Favorites + 
    Iran_Retweets + khalidalkhalifa_Total_Posts + NABEELRAJAB_Favorites + 
    NABEELRAJAB_Retweets + trump_Total_Posts + USEmbassyManama_Favorites + 
    Holiday + Week_Number + TEMP + DEWP + zinc_Close + zinc_Open + 
    WTI_Close + wheat_Open + wheat_Low + tin_Open + tin_High + 
    tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + 
    soybean_Close + soybean_Low + silver_Close + silver_Low + 
    rice_Open + nickel_Open + nickel_Low + natural_gas_Low + 
    monero_Close + monero_High + monero_Low + litecoin_Close + 
    litecoin_Low + lead_Open + lead_Low + Iron_USA_Open + Iron_USA_Low + 
    Gold_Open + Gold_High + corn_Close + corn_Open + corn_High + 
    corn_Low + copper_Open + copper_High + coffee_Open + live_cattle_Close + 
    live_cattle_Low + feed_cattle_Close + feed_cattle_Open + 
    feed_cattle_High + Brent_Close + Brent_Low + Bitcoin_Close + 
    Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + 
    BHD_EUR_Open + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + 
    BAX_Low + JD + Numeric_Weekday + Binary_Weekday, data = master_df, boos = TRUE, v = 10)
## i:  1 Tue Nov 10 15:34:21 2020 
## i:  2 Tue Nov 10 15:36:05 2020 
## i:  3 Tue Nov 10 15:37:41 2020 
## i:  4 Tue Nov 10 15:39:22 2020 
## i:  5 Tue Nov 10 15:41:06 2020 
## i:  6 Tue Nov 10 15:42:49 2020 
## i:  7 Tue Nov 10 15:44:52 2020 
## i:  8 Tue Nov 10 15:46:54 2020 
## i:  9 Tue Nov 10 15:48:50 2020 
## i:  10 Tue Nov 10 15:50:47 2020
print(cvmodel[-1])
## $confusion
##                Observed Class
## Predicted Class   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
##               0 208  91  43  19   5   3   2   0   0   0   2   0   0   1   0   0
##               1 120 139  92  68  28  20  11   9   3   2   1   2   1   2   1   0
##               2  64  87 113  81  44  28  16  11   3  10   4   1   3   1   1   0
##               3   6  16  13  10  18   3   4   1   2   1   1   0   0   0   0   0
##               4   0   0   0   1   0   1   0   1   1   0   0   0   0   0   1   0
##               5   1   2   1   2   4   5   2   0   1   6   2   1   4   1   1   1
##                Observed Class
## Predicted Class  16  18  20  27  29
##               0   0   0   0   0   0
##               1   0   1   1   0   0
##               2   0   1   1   0   0
##               3   0   0   0   0   0
##               4   0   0   0   0   0
##               5   1   0   0   1   1
## 
## $error
## [1] 0.6748802
# Produce a side-by-side comparison of the original dataframe and the predicted value
data.frame(master_df$riot_count, cvmodel$class)

Create a testing and training split for master_df

# Re-read in the dataframe 
master_df <- read.csv('master_df.csv')

# Delete the event_date column from the dataframe
master_df$event_date <- NULL

# Code sourced from Dr Bassett's Lecture5_MLR.R

# Splits the data into a 80% training and 20% testing bins
#train_inds <- sample(nrow(master_df), .8*nrow(master_df))

# Splits the data into a simulation of a time series dataset
train <- master_df[1:1454, c(1:187, 222)]
test <- master_df[1455:1461, c(1:187, 222)]

# Code sourced from Dr Bassett's Lecture5_MLR.R
# Function designed to compute the RMSE of the testing and training data split
train_and_test_error <- function(model){
  train_rmse <- sqrt(mean(resid(model)^2))
  test_rmse <- sqrt(mean((test$total_master_df - predict(model, newdata=test))^2))
  return(data.frame(train_rmse=train_rmse, test_rmse=test_rmse))
}

In model #3, we create a baseline linear model that we can use to compare our comparison against.

3. Baseline model

# Creates a baseline model with all of the predictors for comparison
lm_baseline <- lm(riot_count ~ AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low, data = train)

# Calculate RMSE for future comparison of model effectiveness
lm_baseline_training_RMSE <- as.numeric(train_and_test_error(lm_baseline)[1])
lm_baseline_testing_RMSE <- as.numeric(train_and_test_error(lm_baseline)[2])

# Calculates Mean Absolute Deviation (MAD)
lm_baseline_mad <- mad(residuals(lm_baseline))

# Calculates the Residual Sum of Squares (RSS)
lm_baseline_RSS <- RSS(lm_baseline)

# Produces a summary of the lm_baseline model
lm_baseline_rsquared <- summary(lm_baseline)$r.squared
# Initial and Final Models for Stepwise Regression(AIC/BIC criteria)
initial_model <- lm(riot_count ~ 1, data=train) 
final_model <- lm_baseline

4. Forward Stepwise Regression

# Forward Stepwise Regression

# Code sourced from https://stackoverflow.com/questions/22913774/forward-stepwise-regression

# AIC - Forward
# Code sourced from Dr Bassett's Lecture13.R
AIC_forward <- stepAIC(initial_model, scope=list(lower=initial_model, upper=final_model), direction='forward', trace=FALSE)

#BIC - Forward
# Code sourced from Dr Bassett's Lecture13.R
BIC_forward <- stepAIC(initial_model, scope=list(lower=initial_model, upper=final_model), direction='forward', k=log(nrow(master_df)), trace=FALSE)
# Compute various statistics of the 'Forward' models

# Calculates RMSE
AIC_forward_trainingRMSE <- as.numeric(train_and_test_error(AIC_forward)[1])
AIC_forward_testingRMSE <- as.numeric(train_and_test_error(AIC_forward)[2])

BIC_forward_trainingRMSE <- as.numeric(train_and_test_error(BIC_forward)[1])
BIC_forward_testingRMSE <- as.numeric(train_and_test_error(BIC_forward)[2])

# Calculates Mean Absolute Deviation (MAD)
AIC_forward_MAD <- mad(residuals(AIC_forward))
BIC_forward_MAD <- mad(residuals(BIC_forward))

# Stores R^2 values
AIC_forward_rsquared <- summary(AIC_forward)$r.squared
BIC_forward_rsquared <- summary(BIC_forward)$r.squared

#Calculates RSS
AIC_forward_RSS <- RSS(AIC_forward)
BIC_forward_RSS <- RSS(BIC_forward)

# Store AIC/BIC values
AIC_forward_AIC <- extractAIC(AIC_forward)[2]
BIC_forward_BIC <- extractAIC(BIC_forward)[2]

# Plot residuals of the 'Forward' models
plot(AIC_forward, main = "AIC Forward")

plot(BIC_forward, main = "BIC Forward")

## [1] "Forward AIC training RMSE 2.05722810848444"
## [1] "Forward AIC testing RMSE NaN"
## [1] "Forward BIC training RMSE 2.14231950047345"
## [1] "Forward BIC testing RMSE NaN"
## [1] "Forward AIC MAD 1.38030067089899"
## [1] "Forward BIC MAD 1.41651812817517"
## [1] "Forward AIC adjusted R-squared 0.449534722104852"
## [1] "Forward BIC adjusted R-squared 0.403056108192815"
## [1] "Forward AIC RSS 6153.60061095214"
## [1] "Forward BIC RSS 6673.18075242621"
## [1] "Forward AIC value 2165.71342103413"
## [1] "Forward BIC value 2235.57356324925"

5. Backwards Stepwise Regression

# Backwards Stepwise Regression

# AIC - Backward
# Code sourced from Lecture13.R
AIC_backward <- stepAIC(final_model, scope=list(lower=initial_model, upper=final_model), direction='backward', trace=FALSE)

#BIC - Backward
# Code sourced from Lecture13.R
BIC_backward <- stepAIC(final_model, scope=list(lower=initial_model, upper=final_model), direction='backward', k=log(nrow(master_df)), trace=FALSE)
# Compute various statistics of the 'Backward' models

# Calculates RMSE
AIC_backward_trainingRMSE <- as.numeric(train_and_test_error(AIC_backward)[1])
AIC_backward_testingRMSE <- as.numeric(train_and_test_error(AIC_backward)[2])

BIC_backward_trainingRMSE <- as.numeric(train_and_test_error(BIC_backward)[1])
BIC_backward_testingRMSE <- as.numeric(train_and_test_error(BIC_backward)[2])

# Calculates Mean Absolute Deviation (MAD)
AIC_backward_MAD <- mad(residuals(AIC_backward))
BIC_backward_MAD <- mad(residuals(BIC_backward))

# Stores R^2 values
AIC_backward_rsquared <- summary(AIC_backward)$r.squared
BIC_backward_rsquared <- summary(BIC_backward)$r.squared

#Calculates RSS
AIC_backward_RSS <- RSS(AIC_backward)
BIC_backward_RSS <- RSS(BIC_backward)

# Store AIC/BIC values
AIC_backward_AIC <- extractAIC(AIC_backward)[2]
BIC_backward_BIC <- extractAIC(BIC_backward)[2]

# Plot residuals of the 'Backward' models
plot(AIC_backward, main = "AIC Backward")

plot(BIC_backward, main = "BIC Backward")

## [1] "backward AIC training RMSE 2.00262487992004"
## [1] "backward AIC testing RMSE NaN"
## [1] "backward BIC training RMSE 2.09642345572867"
## [1] "backward BIC testing RMSE NaN"
## [1] "backward AIC MAD 1.42681642214786"
## [1] "backward BIC MAD 1.43781765668812"
## [1] "backward AIC adjusted R-squared 0.478367976290832"
## [1] "backward BIC adjusted R-squared 0.428359420281359"
## [1] "backward AIC RSS 5831.27631966709"
## [1] "backward BIC RSS 6390.31735853048"
## [1] "backward AIC value 2145.48607414822"
## [1] "backward BIC value 2192.59691394075"

6. Both Stepwise Regression

# Both Stepwise Regression

# Code sourced from https://stackoverflow.com/questions/22913774/forward-stepwise-regression

# AIC - Both
#AIC_both <- stepAIC(initial_model, scope=list(lower=initial_model, upper=final_model), direction='both', trace=FALSE)
AIC_both <- stepAIC(initial_model, scope=list(lower=initial_model, upper=final_model), direction='both', trace=FALSE)

#BIC - Both
BIC_both <- stepAIC(initial_model, scope=list(lower=initial_model, upper=final_model), direction='both', k=log(nrow(master_df)), trace=FALSE)
# Compute various statistics of the 'both' models

# Calculates RMSE
AIC_both_trainingRMSE <- as.numeric(train_and_test_error(AIC_both)[1])
AIC_both_testingRMSE <- as.numeric(train_and_test_error(AIC_both)[2])

BIC_both_trainingRMSE <- as.numeric(train_and_test_error(BIC_both)[1])
BIC_both_testingRMSE <- as.numeric(train_and_test_error(BIC_both)[2])

# Calculates Mean Absolute Deviation (MAD)
AIC_both_MAD <- mad(residuals(AIC_both))
BIC_both_MAD <- mad(residuals(BIC_both))

# Stores R^2 values
AIC_both_rsquared <- summary(AIC_both)$r.squared
BIC_both_rsquared <- summary(BIC_both)$r.squared

#Calculates RSS
AIC_both_RSS <- RSS(AIC_both)
BIC_both_RSS <- RSS(BIC_both)

# Store AIC/BIC values
AIC_both_AIC <- extractAIC(AIC_both)[2]
BIC_both_BIC <- extractAIC(BIC_both)[2]

# Plot residuals
plot(AIC_both, main = "AIC Both")

plot(BIC_both, main = "BIC Both")

## [1] "both AIC training RMSE 2.05867628326979"
## [1] "both AIC testing RMSE NaN"
## [1] "both BIC training RMSE 2.14231950047345"
## [1] "both BIC testing RMSE NaN"
## [1] "both AIC MAD 1.39953064378751"
## [1] "both BIC MAD 1.41651812817517"
## [1] "both AIC adjusted R-squared 0.448759455119011"
## [1] "both BIC adjusted R-squared 0.403056108192815"
## [1] "both AIC RSS 6162.26724913859"
## [1] "both BIC RSS 6673.18075242621"
## [1] "both AIC value 2163.75977199312"
## [1] "both BIC value 2235.57356324925"

7. Elastic Net Models (0.25, 0.5, 0.75)

# Compute elastic models of 0.25, 0.5, and 0.75
model_cvglm_25 <- cv.glmnet(x=model.matrix(riot_count ~ (AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low) ,data = master_df), y = master_df$riot_count, nfolds = 10, nlambda = 200, alpha = 0.25)

model_cvglm_50 <- cv.glmnet(x=model.matrix(riot_count ~ (AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low) ,data = master_df), y = master_df$riot_count, nfolds = 10, nlambda = 200, alpha = 0.50)

model_cvglm_75 <- cv.glmnet(x=model.matrix(riot_count ~ (AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low) , data = master_df), y = master_df$riot_count, nfolds = 10, nlambda = 200, alpha = 0.75)

# Calculate R-squared for each model
# Code sourced from: https://stackoverflow.com/questions/50610895/how-to-calculate-r-squared-value-for-lasso-regression-using-glmnet-in-r
model_cvglm_25_rsquared <- model_cvglm_25$glmnet.fit$dev.ratio
model_cvglm_50_rsquared <- model_cvglm_50$glmnet.fit$dev.ratio
model_cvglm_75_rsquared <- model_cvglm_75$glmnet.fit$dev.ratio

# Generate R^2 plots
plot(model_cvglm_25$lambda,model_cvglm_25_rsquared, xlab = 'Lambda', ylab = 'R^2', main = '0.25 Elastic Model R^2')

plot(model_cvglm_50$lambda,model_cvglm_50_rsquared, xlab = 'Lambda', ylab = 'R^2', main = '0.50 Elastic Model R^2')

plot(model_cvglm_75$lambda,model_cvglm_75_rsquared, xlab = 'Lambda', ylab = 'R^2', main = '0.75 Elastic Model R^2')

# Produce the coefficients of the elastic models
coef(model_cvglm_25)
coef(model_cvglm_50)
coef(model_cvglm_75)
# Analyze model data
model_cvglm_25
## 
## Call:  cv.glmnet(x = model.matrix(riot_count ~ (AJArabic_Total_Posts +      AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts +      al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts +      Alwatan_Live_Favorites + Alwatan_Live_Retweets + bahrain_moi_Total_Posts +      bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts +      BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts +      BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts +      bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts +      bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts +      Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts +      duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts +      feb14revolution_Favorites + feb14revolution_Retweets + GDNonline_Total_Posts +      GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts +      Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites +      IranNW_Retweets + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites +      khalidalkhalifa_Retweets + khamenei_Total_Posts + khamenei_Favorites +      khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +      KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites +      malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites +      NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites +      netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites +      NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites +      rouhani_Retweets + trump_Total_Posts + trump_Favorites +      trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites +      USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP +      zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close +      WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open +      wheat_High + wheat_Low + tin_Close + tin_Open + tin_High +      tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low +      soybean_Close + soybean_Open + soybean_High + soybean_Low +      silver_Close + silver_Open + silver_High + silver_Low + rice_Close +      rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open +      platinum_High + platinum_Low + nickel_Close + nickel_Open +      nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open +      natural_gas_High + natural_gas_Low + monero_Close + monero_Open +      monero_High + monero_Low + litecoin_Close + litecoin_Open +      litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High +      lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High +      Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +      GBP_BHD_Close + GBP_BHD_Open + GBP_BHD_High + GBP_BHD_Low +      cotton_Close + cotton_Open + cotton_High + cotton_Low + corn_Close +      corn_Open + corn_High + corn_Low + copper_Close + copper_Open +      copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High +      coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High +      live_cattle_Low + feed_cattle_Close + feed_cattle_Open +      feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open +      Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High +      Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High +      BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low),      data = master_df), y = master_df$riot_count, nfolds = 10,      nlambda = 200, alpha = 0.25) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Measure     SE Nonzero
## min 0.3739   4.916 0.5647      43
## 1se 1.4313   5.481 0.6602      15
model_cvglm_50
## 
## Call:  cv.glmnet(x = model.matrix(riot_count ~ (AJArabic_Total_Posts +      AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts +      al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts +      Alwatan_Live_Favorites + Alwatan_Live_Retweets + bahrain_moi_Total_Posts +      bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts +      BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts +      BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts +      bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts +      bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts +      Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts +      duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts +      feb14revolution_Favorites + feb14revolution_Retweets + GDNonline_Total_Posts +      GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts +      Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites +      IranNW_Retweets + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites +      khalidalkhalifa_Retweets + khamenei_Total_Posts + khamenei_Favorites +      khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +      KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites +      malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites +      NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites +      netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites +      NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites +      rouhani_Retweets + trump_Total_Posts + trump_Favorites +      trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites +      USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP +      zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close +      WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open +      wheat_High + wheat_Low + tin_Close + tin_Open + tin_High +      tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low +      soybean_Close + soybean_Open + soybean_High + soybean_Low +      silver_Close + silver_Open + silver_High + silver_Low + rice_Close +      rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open +      platinum_High + platinum_Low + nickel_Close + nickel_Open +      nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open +      natural_gas_High + natural_gas_Low + monero_Close + monero_Open +      monero_High + monero_Low + litecoin_Close + litecoin_Open +      litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High +      lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High +      Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +      GBP_BHD_Close + GBP_BHD_Open + GBP_BHD_High + GBP_BHD_Low +      cotton_Close + cotton_Open + cotton_High + cotton_Low + corn_Close +      corn_Open + corn_High + corn_Low + copper_Close + copper_Open +      copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High +      coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High +      live_cattle_Low + feed_cattle_Close + feed_cattle_Open +      feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open +      Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High +      Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High +      BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low),      data = master_df), y = master_df$riot_count, nfolds = 10,      nlambda = 200, alpha = 0.5) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Measure     SE Nonzero
## min 0.2051   4.937 0.3921      34
## 1se 0.5678   5.313 0.4209      14
model_cvglm_75 
## 
## Call:  cv.glmnet(x = model.matrix(riot_count ~ (AJArabic_Total_Posts +      AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts +      al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts +      Alwatan_Live_Favorites + Alwatan_Live_Retweets + bahrain_moi_Total_Posts +      bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts +      BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts +      BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts +      bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts +      bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts +      Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts +      duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts +      feb14revolution_Favorites + feb14revolution_Retweets + GDNonline_Total_Posts +      GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts +      Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites +      IranNW_Retweets + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites +      khalidalkhalifa_Retweets + khamenei_Total_Posts + khamenei_Favorites +      khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +      KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites +      malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites +      NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites +      netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites +      NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites +      rouhani_Retweets + trump_Total_Posts + trump_Favorites +      trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites +      USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP +      zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close +      WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open +      wheat_High + wheat_Low + tin_Close + tin_Open + tin_High +      tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low +      soybean_Close + soybean_Open + soybean_High + soybean_Low +      silver_Close + silver_Open + silver_High + silver_Low + rice_Close +      rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open +      platinum_High + platinum_Low + nickel_Close + nickel_Open +      nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open +      natural_gas_High + natural_gas_Low + monero_Close + monero_Open +      monero_High + monero_Low + litecoin_Close + litecoin_Open +      litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High +      lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High +      Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +      GBP_BHD_Close + GBP_BHD_Open + GBP_BHD_High + GBP_BHD_Low +      cotton_Close + cotton_Open + cotton_High + cotton_Low + corn_Close +      corn_Open + corn_High + corn_Low + copper_Close + copper_Open +      copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High +      coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High +      live_cattle_Low + feed_cattle_Close + feed_cattle_Open +      feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open +      Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High +      Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High +      BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low),      data = master_df), y = master_df$riot_count, nfolds = 10,      nlambda = 200, alpha = 0.75) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Measure     SE Nonzero
## min 0.1432   4.938 0.4613      27
## 1se 0.4555   5.388 0.5132       9
# Generate MSE
model_cvglm_25_mse <- model_cvglm_25$cvm[model_cvglm_25 $lambda == model_cvglm_25 $lambda.min]
model_cvglm_50_mse <- model_cvglm_50$cvm[model_cvglm_50 $lambda == model_cvglm_50 $lambda.min]
model_cvglm_75_mse <- model_cvglm_75$cvm[model_cvglm_75 $lambda == model_cvglm_75 $lambda.min]

# Plot
plot(model_cvglm_25)

plot(model_cvglm_50)

plot(model_cvglm_75)

8. LASSO Model

# Compute a LASSO model
model_lasso <- cv.glmnet(x=model.matrix(riot_count ~ (AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low), data = master_df), y = master_df$riot_count, alpha = 1)

# Calculate R-squared
model_lasso_rsquared <- model_lasso$glmnet.fit$dev.ratio

# Plot r-squared for the LASSO model
plot20 <- plot(model_lasso$lambda,model_lasso_rsquared, xlab = 'Lambda', ylab = 'R^2', main = 'Lasso Model R^2')

# Generate coefficients
coef(model_lasso)
# Generate MSE
model_lasso_MSE <- model_lasso$cvm[model_lasso$lambda == model_lasso$lambda.min]

# Analyze model data
model_lasso
## 
## Call:  cv.glmnet(x = model.matrix(riot_count ~ (AJArabic_Total_Posts +      AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts +      al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts +      Alwatan_Live_Favorites + Alwatan_Live_Retweets + bahrain_moi_Total_Posts +      bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts +      BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts +      BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts +      bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts +      bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts +      Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts +      duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts +      feb14revolution_Favorites + feb14revolution_Retweets + GDNonline_Total_Posts +      GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts +      Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites +      IranNW_Retweets + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites +      khalidalkhalifa_Retweets + khamenei_Total_Posts + khamenei_Favorites +      khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +      KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites +      malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites +      NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites +      netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites +      NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites +      rouhani_Retweets + trump_Total_Posts + trump_Favorites +      trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites +      USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP +      zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close +      WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open +      wheat_High + wheat_Low + tin_Close + tin_Open + tin_High +      tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low +      soybean_Close + soybean_Open + soybean_High + soybean_Low +      silver_Close + silver_Open + silver_High + silver_Low + rice_Close +      rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open +      platinum_High + platinum_Low + nickel_Close + nickel_Open +      nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open +      natural_gas_High + natural_gas_Low + monero_Close + monero_Open +      monero_High + monero_Low + litecoin_Close + litecoin_Open +      litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High +      lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High +      Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +      GBP_BHD_Close + GBP_BHD_Open + GBP_BHD_High + GBP_BHD_Low +      cotton_Close + cotton_Open + cotton_High + cotton_Low + corn_Close +      corn_Open + corn_High + corn_Low + copper_Close + copper_Open +      copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High +      coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High +      live_cattle_Low + feed_cattle_Close + feed_cattle_Open +      feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open +      Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High +      Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High +      BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low),      data = master_df), y = master_df$riot_count, alpha = 1) 
## 
## Measure: Mean-Squared Error 
## 
##     Lambda Measure     SE Nonzero
## min 0.1060   5.000 0.4534      25
## 1se 0.3237   5.408 0.5369       8
 # Plot
plot(model_lasso)
title(main = 'LASSO Model', adj = 0, line = -.001)

We will now apply Principal Component Analysis (PCA) to reduce the dimensionality. We then create a new target variable using the loadings and create an AdaBoost model.

9. PCA & AdaBoost

# Determine dimensions of dataframe
d <- dim(master_df) 

# Set a boundary to capture ACLED data
s <- (d[2]-33):d[2] 

# Create a subset of the data
events <- master_df[, s] 

# Run a summary
summary(events) 
##  Armed_clash_count  Armed_clash_fatalities  battle_count     
##  Min.   :0.000000   Min.   :0.000000       Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000       1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000       Median :0.000000  
##  Mean   :0.004791   Mean   :0.002738       Mean   :0.004791  
##  3rd Qu.:0.000000   3rd Qu.:0.000000       3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :3.000000       Max.   :1.000000  
##  battle_fatalities    SAMA_count        SAMA_fatalities  RELIED_count     
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0       Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0       1st Qu.:0.000000  
##  Median :0.000000   Median :0.0000000   Median :0       Median :0.000000  
##  Mean   :0.002738   Mean   :0.0006845   Mean   :0       Mean   :0.008898  
##  3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0       3rd Qu.:0.000000  
##  Max.   :3.000000   Max.   :1.0000000   Max.   :0       Max.   :1.000000  
##  RELIED_fatalities  Grenade_count       Grenade_fatalities explosion_count  
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0          Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0          1st Qu.:0.00000  
##  Median :0.000000   Median :0.0000000   Median :0          Median :0.00000  
##  Mean   :0.002738   Mean   :0.0006845   Mean   :0          Mean   :0.01027  
##  3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0          3rd Qu.:0.00000  
##  Max.   :1.000000   Max.   :1.0000000   Max.   :0          Max.   :1.00000  
##  explosions_fatalities Sexual_violence_count sexual_violence_fatalities
##  Min.   :0.000000      Min.   :0.000000      Min.   :0                 
##  1st Qu.:0.000000      1st Qu.:0.000000      1st Qu.:0                 
##  Median :0.000000      Median :0.000000      Median :0                 
##  Mean   :0.002738      Mean   :0.004107      Mean   :0                 
##  3rd Qu.:0.000000      3rd Qu.:0.000000      3rd Qu.:0                 
##  Max.   :1.000000      Max.   :2.000000      Max.   :0                 
##   Attack_count     Attack_fatalities    VAC_count       VAC_fatalities    
##  Min.   :0.00000   Min.   :0.000000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.00000   Median :0.000000   Median :0.00000   Median :0.000000  
##  Mean   :0.00616   Mean   :0.002053   Mean   :0.01027   Mean   :0.002053  
##  3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.000000  
##  Max.   :1.00000   Max.   :1.000000   Max.   :2.00000   Max.   :1.000000  
##    EFAP_count       EFAP_fatalities      PWI_count      PWI_fatalities
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.0000   Min.   :0     
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.0000   1st Qu.:0     
##  Median :0.000000   Median :0.000000   Median :0.0000   Median :0     
##  Mean   :0.008214   Mean   :0.002053   Mean   :0.1951   Mean   :0     
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.0000   3rd Qu.:0     
##  Max.   :2.000000   Max.   :2.000000   Max.   :8.0000   Max.   :0     
##  Peaceful_protest_count Peaceful_protest_fatalities protest_count   
##  Min.   : 0.000         Min.   :0                   Min.   : 0.000  
##  1st Qu.: 0.000         1st Qu.:0                   1st Qu.: 0.000  
##  Median : 2.000         Median :0                   Median : 2.000  
##  Mean   : 3.606         Mean   :0                   Mean   : 3.809  
##  3rd Qu.: 5.000         3rd Qu.:0                   3rd Qu.: 5.000  
##  Max.   :39.000         Max.   :0                   Max.   :42.000  
##  protest_fatalities mob_violence_count mob_violence_fatalities
##  Min.   :0.000000   Min.   :0.0000     Min.   :0.000000       
##  1st Qu.:0.000000   1st Qu.:0.0000     1st Qu.:0.000000       
##  Median :0.000000   Median :0.0000     Median :0.000000       
##  Mean   :0.002053   Mean   :0.6454     Mean   :0.001369       
##  3rd Qu.:0.000000   3rd Qu.:1.0000     3rd Qu.:0.000000       
##  Max.   :2.000000   Max.   :9.0000     Max.   :1.000000       
##  violent_demonstration_count violent_demonstration_fatalities   riot_count    
##  Min.   : 0.00               Min.   :0.000000                 Min.   : 0.000  
##  1st Qu.: 0.00               1st Qu.:0.000000                 1st Qu.: 0.000  
##  Median : 1.00               Median :0.000000                 Median : 1.000  
##  Mean   : 1.57               Mean   :0.003422                 Mean   : 2.216  
##  3rd Qu.: 2.00               3rd Qu.:0.000000                 3rd Qu.: 3.000  
##  Max.   :27.00               Max.   :5.000000                 Max.   :29.000  
##  riot_fatalities    total_violent_events total_fatalities 
##  Min.   :0.000000   Min.   : 0.00        Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.: 1.00        1st Qu.:0.00000  
##  Median :0.000000   Median : 4.00        Median :0.00000  
##  Mean   :0.004791   Mean   : 6.05        Mean   :0.01437  
##  3rd Qu.:0.000000   3rd Qu.: 8.00        3rd Qu.:0.00000  
##  Max.   :5.000000   Max.   :58.00        Max.   :5.00000
 # Determine dimensions
dim(events)
## [1] 1461   34
# Determine which column/predictor numbers we want to analyze
colnames(events)
##  [1] "Armed_clash_count"                "Armed_clash_fatalities"          
##  [3] "battle_count"                     "battle_fatalities"               
##  [5] "SAMA_count"                       "SAMA_fatalities"                 
##  [7] "RELIED_count"                     "RELIED_fatalities"               
##  [9] "Grenade_count"                    "Grenade_fatalities"              
## [11] "explosion_count"                  "explosions_fatalities"           
## [13] "Sexual_violence_count"            "sexual_violence_fatalities"      
## [15] "Attack_count"                     "Attack_fatalities"               
## [17] "VAC_count"                        "VAC_fatalities"                  
## [19] "EFAP_count"                       "EFAP_fatalities"                 
## [21] "PWI_count"                        "PWI_fatalities"                  
## [23] "Peaceful_protest_count"           "Peaceful_protest_fatalities"     
## [25] "protest_count"                    "protest_fatalities"              
## [27] "mob_violence_count"               "mob_violence_fatalities"         
## [29] "violent_demonstration_count"      "violent_demonstration_fatalities"
## [31] "riot_count"                       "riot_fatalities"                 
## [33] "total_violent_events"             "total_fatalities"
# We select five events to analyze
new_events <- events[, c(23,25,29,31,33)]

# Applying princomp() function to the dataset.
pca1 <- princomp(scale(new_events))

#
plot(pca1, ylim = c(0,5))

# Determine the 'loadings' i.e. weight of each variable
pca1$loadings
## 
## Loadings:
##                             Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Peaceful_protest_count       0.453  0.436  0.105  0.770       
## protest_count                0.455  0.432        -0.512 -0.579
## violent_demonstration_count  0.398 -0.576  0.714              
## riot_count                   0.418 -0.528 -0.657  0.150 -0.303
## total_violent_events         0.504  0.110 -0.197 -0.349  0.757
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## SS loadings       1.0    1.0    1.0    1.0    1.0
## Proportion Var    0.2    0.2    0.2    0.2    0.2
## Cumulative Var    0.2    0.4    0.6    0.8    1.0
# Scree plot
plot(cumsum(pca1$sdev^2/sum(pca1$sdev^2)))

## Histogram Scree chart 
pca.var <- pca1$sdev^2
pca.var.per <- round(pca.var/sum(pca.var)*100, 1)
barplot(pca.var.per, main="Scree Plot", xlab="Principal Component", ylab="Percent Variation", ylim = c(0,100))

pc <- prcomp(new_events, scale=TRUE)

plot(pc)

plot(cumsum(pc$sdev^2/sum(pc$sdev^2)))

# First four principal components
comp <- data.frame(pc$x[,1:4])
# Plot
plot(comp, pch=16, col=rgb(0,0,0,0.5))

autoplot(pc, data=new_events, colour = 'riot_count')

Apply ADABOOST to our new created response variable as identified via PCA

# Create a new predictor variable using the ACLED values (peaceful protests, etc) * their respective weights computed from PCA above
master_df$PCA_Value <- round(master_df$Peaceful_protest_count*pca1$loadings[1] + master_df$protest_count*pca1$loadings[2] + master_df$violent_demonstration_count*pca1$loadings[3] + master_df$riot_count*pca1$loadings[4] + as.numeric(master_df$riot_count)*pca1$loadings[5])

# Apply log transformation
master_df$PCA_Value <- log(master_df$PCA_Value +1)

#master_df$riot_count <- round(exp(master_df$riot_count))
master_df$PCA_Value <- as.factor(master_df$riot_count)

cvmodel = boosting.cv(PCA_Value ~ al_wafa_Retweets + al_wafa_Total_Posts + Coalition14_Total_Posts + Coalition14_Favorites + Alwatan_Live_Total_Posts + BahrainRights_Favorites + AJArabic_Total_Posts + AJArabic_Retweets + BBCArabic_Total_Posts + bh14feb2011_Favorites + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Favorites + Peaceful_protest_count + duraz_youth_Favorites + Iran_Favorites + Iran_Retweets + rouhani_Total_Posts, data = master_df, boos = TRUE, v = 10)
## i:  1 Tue Nov 10 16:08:52 2020 
## i:  2 Tue Nov 10 16:09:33 2020 
## i:  3 Tue Nov 10 16:10:17 2020 
## i:  4 Tue Nov 10 16:11:02 2020 
## i:  5 Tue Nov 10 16:11:42 2020 
## i:  6 Tue Nov 10 16:12:29 2020 
## i:  7 Tue Nov 10 16:13:14 2020 
## i:  8 Tue Nov 10 16:14:35 2020 
## i:  9 Tue Nov 10 16:16:01 2020 
## i:  10 Tue Nov 10 16:17:26 2020
print(cvmodel[-1])
## $confusion
##                Observed Class
## Predicted Class   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
##               0 278 151  75  38  12   5   6   2   1   1   1   0   0   0   0   0
##               1  59  82  58  47  23  16   5   4   0   4   1   1   0   3   1   0
##               2  62 101 122  92  63  34  23  14   7  10   8   2   7   1   2   1
##               3   0   1   7   4   1   2   0   2   0   1   0   0   0   0   0   0
##               4   0   0   0   0   0   3   0   0   1   0   0   1   0   0   0   0
##               5   0   0   0   0   0   0   1   0   1   3   0   0   1   1   1   0
##                Observed Class
## Predicted Class  16  18  20  27  29
##               0   0   0   1   0   0
##               1   0   0   0   0   0
##               2   0   2   1   1   1
##               3   0   0   0   0   0
##               4   0   0   0   0   0
##               5   1   0   0   0   0
## 
## $error
## [1] 0.6673511
data.frame(master_df$PCA_Value, cvmodel$class)

Despite these models performance, the team will create and explore time series models to include MARIMA, DYNLM, univariate, and rolling horizon & random forest.

Time Series Models

10. MARIMA

# Re-read in the dataframe 
master_df <- read.csv('master_df.csv')

# Determine the dimensions of the dataframe
ColNames_df <- as.data.frame(colnames(master_df))

# Determine the dimension of the dataframe
d <- dim(master_df)

# Subset the dataframe
total_violent_events <- master_df[,c(1, 2, 5, 8, 11, 17, 225)]

# Produce a summary of the dataframe
summary(total_violent_events)

# 
Model1 <- define.model(kvar=7, ar=c(1, 1, 1, 1, 1, 1, 1), ma=c(1), reg.var=3)

# Run MARIMA
Marima1 <- marima(ts(total_violent_events[1:900, ]), Model1$ar.pattern, Model1$ma.pattern, penalty=1)

# Identify the starting point
nstart <- 900

# Identify the step
nstep <- 1461 - 900

Forecasts <- arma.forecast(series=ts(total_violent_events), marima=Marima1, nstart=nstart, nstep=nstep )

One.step <- Forecasts$forecasts[, (nstart+1)]
One.step
Predict <- Forecasts$forecasts[7, 901:1461]
Predict
dates<-total_violent_events[901:1461,1]
stdv<-sqrt(Forecasts$pred.var[7, 7, ])
upper.lim=Predict+stdv*1.645
lower.lim=Predict-stdv*1.645
Out<-rbind(dates, Predict, upper.lim, lower.lim)

observations <- total_violent_events[901:1461, 7]
index <- which(observations == 0)
MAPE <- mean(abs(Predict[-index] - observations[-index])/observations[-index])
naive <- total_violent_events[901:1461, 7] - total_violent_events[871:1431, 7]
index2 <- which(naive == 0)
MASE <- mean(abs(Predict[-index2] - observations[-index2])/abs(naive[-index2]))

## Incorporate rolling horizon

FullObs <- ts(total_violent_events)
Training.Series<-list()
Naive.Forecast<-list()
MARIMA.model<-list()
MARIMA.Forecast<-list()
Observed<-list()

Naive.Forecast.Errors<-list()
MARIMA.Forecast.Errors<-list()
Test.Series <- list()

for(i in 901:1461){
  Training.Series[[i]] <- ts(total_violent_events[1:(i-1), ])
  Test.Series[[i]] <- ts(total_violent_events[1:(i+29), ])
  Naive.Forecast[[i]] <- total_violent_events[(i-30):(i-1), 7]
  MARIMA.model[[i]]<-marima(Training.Series[[i]], Model1$ar.pattern, Model1$ma.pattern, penalty=1)
  MARIMA.Forecast[[i]]<-arma.forecast(series=Test.Series[[i]], marima=MARIMA.model[[i]], nstart=(i-1), nstep=30)$forecasts[7, i:(i+29)]
  Observed[[i]] <- total_violent_events[i:(i+29), 7]
  Naive.Forecast.Errors[[i]] <- Observed[[i]] - Naive.Forecast[[i]]
  MARIMA.Forecast.Errors[[i]] <- Observed[[i]] - MARIMA.Forecast[[i]]
  #print(i)  #so we can see progress
}
MARIMA.1Step.Forecast<-do.call(rbind, MARIMA.Forecast)[,1]
Naive.1Step.Forecast<-do.call(rbind, Naive.Forecast)[,1]
Observed.Table<-do.call(rbind, Observed)
Naive.Error.Table<-do.call(rbind, Naive.Forecast.Errors)
MARIMA.Error.Table<-do.call(rbind, MARIMA.Forecast.Errors)

index <- which(Observed.Table[,1] == 0)
Vector.Observed.Table <- as.vector(Observed.Table[1:532,])
Vector.Naive.Error.Table <- as.vector(Naive.Error.Table[1:532,])
Vector.MARIMA.Error.Table <- as.vector(MARIMA.Error.Table[1:532,])

index2 <- which(Vector.Observed.Table == 0)

Naive.1Step.MAPE<-mean(abs(Naive.Error.Table[-index,1])/Observed.Table[-index,1])
Naive.1Yr.MAPE<-mean(abs(Vector.Naive.Error.Table[-index2])/Vector.Observed.Table[-index2])
MARIMA.1Step.MAPE<-mean(abs(MARIMA.Error.Table[-index,1])/Observed.Table[-index,1])
MARIMA.1Yr.MAPE<-mean(abs(Vector.MARIMA.Error.Table[-index2])/Vector.Observed.Table[-index2])  

MARIMA.1Step.MASE<-mean(abs(MARIMA.Error.Table[-index,1]))/mean(abs(Naive.Error.Table[-index,1]))
MARIMA.1Yr.MASE<-mean(abs(Vector.MARIMA.Error.Table[-index2]))/mean(abs(Vector.Naive.Error.Table[-index2]))

#### Results

Naive.Performance<-cbind(Naive.1Step.MAPE, Naive.1Yr.MAPE, 1, 1)
MARIMA.Performance<-cbind(MARIMA.1Step.MAPE, MARIMA.1Yr.MAPE, MARIMA.1Step.MASE, MARIMA.1Yr.MASE)

# Create the table for display
Performance.Matrix<-rbind(Naive.Performance, MARIMA.Performance)
Performance.Matrix<-signif(Performance.Matrix, 2)
Performance.Matrix<-cbind(c("Naive", "MARIMA"), Performance.Matrix)
Performance.Matrix<-data.frame(Performance.Matrix)
colnames(Performance.Matrix)<-c("Model", "1 Step MAPE", "12 Step MAPE", "1 Step MASE", "12 Step MASE")
Performance.Matrix<-knitr::kable(Performance.Matrix, digits=2) 
Performance.Matrix
Model 1 Step MAPE 12 Step MAPE 1 Step MASE 12 Step MASE
Naive 1.3 1.3 1 1
MARIMA 1.2 1.7 0.74 0.93

11. DYNLM

master_df <- read.csv('master_df.csv')

master_df$event_date <- as.Date(master_df$event_date, format = '%m/%d/%Y')

#master_df$total_violent_events <- round(exp(master_df$total_violent_events)-1)

# Create a dataframe with individual column names
ColNames_df <- as.data.frame(colnames(master_df))

master_df$Holiday <- as.factor(master_df$Holiday)

# Create a time series for each variable in the data frame
for(i in 1:nrow(ColNames_df)){
  assign(paste0(ColNames_df[i, ]), ts(master_df[ColNames_df[i,]], start = c(2016, 1, 1), end = c(2019, 12, 31), frequency = 365))
}

# generate the term spread series
TSpread <- al_wafa_Total_Posts - riot_count

# Estimate both equations using 'dynlm()'
VAR_EQ1 <- dynlm(riot_count ~ L(riot_count, c(1,2,6,22,29,30)) + al_wafa_Total_Posts + AJArabic_Retweets + BBCArabic_Total_Posts +  Coalition14_Total_Posts+ duraz_youth_Total_Posts + NABEELRAJAB_Retweets +   Brent_High + Brent_High:Iron_USA_Close + rice_Close + L(TSpread,2), start = c(2016, 1, 1), end = c(2019, 12, 31))

VAR_EQ2 <- dynlm(riot_count ~ L(riot_count, c(18,29,45,48,72))  +  al_wafa_Retweets + al_wafa_Total_Posts*Coalition14_Total_Posts + BAX_Close + BAX_Low + AJArabic_Retweets + BBCArabic_Total_Posts + bh14feb2011_Favorites + bna_ar_Favorites + bna_ar_Retweets +  Brent_Close + Brent_Low + Brent_Open + Coalition14_Favorites + coffee_Close + coffee_High + copper_Close + copper_High + duraz_youth_Favorites + feed_cattle_High + feed_cattle_Open +  Gold_Close + Holiday + Iran_Favorites + Iran_Retweets +  L(TSpread, c(1,14, 28, 47)) + Iron_USA_Close + litecoin_Low + litecoin_Open + live_cattle_Close + live_cattle_Low  + monero_Close + monero_High + monero_Open + MXSPD + nickel_Low + nickel_Open + platinum_Close + rice_Close  + rice_Open + rouhani_Total_Posts + soybean_High  +sugar_Close + TEMP + tin_Close + tin_Open +  WDSP + wheat_Low + wheat_Open + WTI_Close + WTI_Low, start = c(2016, 1, 1), end = c(2019, 12, 31))

summary(VAR_EQ2)
## 
## Time series regression with "ts" data:
## Start = 2016(73), End = 2019(12)
## 
## Call:
## dynlm(formula = riot_count ~ L(riot_count, c(18, 29, 45, 48, 
##     72)) + al_wafa_Retweets + al_wafa_Total_Posts * Coalition14_Total_Posts + 
##     BAX_Close + BAX_Low + AJArabic_Retweets + BBCArabic_Total_Posts + 
##     bh14feb2011_Favorites + bna_ar_Favorites + bna_ar_Retweets + 
##     Brent_Close + Brent_Low + Brent_Open + Coalition14_Favorites + 
##     coffee_Close + coffee_High + copper_Close + copper_High + 
##     duraz_youth_Favorites + feed_cattle_High + feed_cattle_Open + 
##     Gold_Close + Holiday + Iran_Favorites + Iran_Retweets + L(TSpread, 
##     c(1, 14, 28, 47)) + Iron_USA_Close + litecoin_Low + litecoin_Open + 
##     live_cattle_Close + live_cattle_Low + monero_Close + monero_High + 
##     monero_Open + MXSPD + nickel_Low + nickel_Open + platinum_Close + 
##     rice_Close + rice_Open + rouhani_Total_Posts + soybean_High + 
##     sugar_Close + TEMP + tin_Close + tin_Open + WDSP + wheat_Low + 
##     wheat_Open + WTI_Close + WTI_Low, start = c(2016, 1, 1), 
##     end = c(2019, 12, 31))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8143 -1.2567 -0.3165  0.8569 13.7288 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  6.764e+00  5.435e+00   1.245
## L(riot_count, c(18, 29, 45, 48, 72))18      -5.570e-02  2.518e-02  -2.212
## L(riot_count, c(18, 29, 45, 48, 72))29      -9.058e-03  2.544e-02  -0.356
## L(riot_count, c(18, 29, 45, 48, 72))45      -5.441e-02  2.512e-02  -2.166
## L(riot_count, c(18, 29, 45, 48, 72))48       5.985e-02  2.528e-02   2.368
## L(riot_count, c(18, 29, 45, 48, 72))72      -4.977e-02  2.442e-02  -2.038
## al_wafa_Retweets                             1.163e-03  1.046e-03   1.112
## al_wafa_Total_Posts                          3.111e-02  1.035e-02   3.007
## Coalition14_Total_Posts                      3.339e-02  8.266e-03   4.040
## BAX_Close                                    3.824e-02  1.715e-02   2.230
## BAX_Low                                     -3.525e-02  1.715e-02  -2.055
## AJArabic_Retweets                            6.790e-05  2.206e-05   3.078
## BBCArabic_Total_Posts                        1.555e-02  6.456e-03   2.408
## bh14feb2011_Favorites                       -3.307e-03  8.054e-04  -4.106
## bna_ar_Favorites                            -2.203e-03  1.090e-03  -2.022
## bna_ar_Retweets                              5.787e-03  3.066e-03   1.888
## Brent_Close                                 -7.284e-01  2.600e-01  -2.801
## Brent_Low                                    5.829e-01  2.908e-01   2.005
## Brent_Open                                   1.373e-01  1.091e-01   1.259
## Coalition14_Favorites                       -9.916e-04  5.986e-04  -1.657
## coffee_Close                                 1.008e-01  4.964e-02   2.031
## coffee_High                                 -8.109e-02  4.802e-02  -1.689
## copper_Close                                 6.215e+00  2.489e+00   2.497
## copper_High                                 -3.630e+00  2.423e+00  -1.498
## duraz_youth_Favorites                        2.003e-03  4.081e-04   4.907
## feed_cattle_High                            -3.112e-01  8.793e-02  -3.539
## feed_cattle_Open                             2.077e-01  8.702e-02   2.387
## Gold_Close                                  -7.582e-03  3.290e-03  -2.304
## Holiday                                     -8.699e-01  3.372e-01  -2.580
## Iran_Favorites                              -9.536e-04  1.604e-03  -0.595
## Iran_Retweets                                1.227e-03  8.141e-04   1.507
## L(TSpread, c(1, 14, 28, 47))1                1.034e-02  5.544e-03   1.865
## L(TSpread, c(1, 14, 28, 47))14              -1.133e-02  5.120e-03  -2.213
## L(TSpread, c(1, 14, 28, 47))28              -2.652e-02  5.145e-03  -5.154
## L(TSpread, c(1, 14, 28, 47))47              -7.746e-03  4.770e-03  -1.624
## Iron_USA_Close                              -1.270e-02  8.143e-03  -1.560
## litecoin_Low                                 4.891e-02  1.927e-02   2.539
## litecoin_Open                               -4.286e-02  1.815e-02  -2.361
## live_cattle_Close                            3.609e-01  1.053e-01   3.426
## live_cattle_Low                             -2.777e-01  1.056e-01  -2.629
## monero_Close                                -3.093e-02  1.380e-02  -2.241
## monero_High                                  4.232e-02  1.794e-02   2.359
## monero_Open                                 -1.522e-02  1.112e-02  -1.369
## MXSPD                                       -1.914e-02  3.244e-02  -0.590
## nickel_Low                                   7.161e-04  5.874e-04   1.219
## nickel_Open                                 -7.422e-04  5.525e-04  -1.343
## platinum_Close                               8.390e-03  2.471e-03   3.395
## rice_Close                                   5.597e-01  4.878e-01   1.147
## rice_Open                                   -7.359e-01  4.960e-01  -1.484
## rouhani_Total_Posts                         -1.359e-01  8.225e-02  -1.652
## soybean_High                                -2.793e-03  2.302e-03  -1.213
## sugar_Close                                 -2.491e-01  9.227e-02  -2.700
## TEMP                                         3.337e-02  1.424e-02   2.344
## tin_Close                                   -1.135e-03  4.182e-04  -2.715
## tin_Open                                     9.499e-04  4.202e-04   2.261
## WDSP                                         5.661e-02  4.035e-02   1.403
## wheat_Low                                   -2.618e-02  1.551e-02  -1.688
## wheat_Open                                   2.681e-02  1.483e-02   1.808
## WTI_Close                                    6.347e-01  2.791e-01   2.274
## WTI_Low                                     -6.205e-01  2.824e-01  -2.197
## al_wafa_Total_Posts:Coalition14_Total_Posts  4.565e-04  9.884e-05   4.618
##                                             Pr(>|t|)    
## (Intercept)                                 0.213576    
## L(riot_count, c(18, 29, 45, 48, 72))18      0.027214 *  
## L(riot_count, c(18, 29, 45, 48, 72))29      0.721908    
## L(riot_count, c(18, 29, 45, 48, 72))45      0.030539 *  
## L(riot_count, c(18, 29, 45, 48, 72))48      0.018090 *  
## L(riot_count, c(18, 29, 45, 48, 72))72      0.041830 *  
## al_wafa_Retweets                            0.266533    
## al_wafa_Total_Posts                         0.002707 ** 
## Coalition14_Total_Posts                     5.78e-05 ***
## BAX_Close                                   0.026007 *  
## BAX_Low                                     0.040100 *  
## AJArabic_Retweets                           0.002145 ** 
## BBCArabic_Total_Posts                       0.016215 *  
## bh14feb2011_Favorites                       4.37e-05 ***
## bna_ar_Favorites                            0.043495 *  
## bna_ar_Retweets                             0.059365 .  
## Brent_Close                                 0.005191 ** 
## Brent_Low                                   0.045275 *  
## Brent_Open                                  0.208222    
## Coalition14_Favorites                       0.097935 .  
## coffee_Close                                0.042558 *  
## coffee_High                                 0.091632 .  
## copper_Close                                0.012701 *  
## copper_High                                 0.134389    
## duraz_youth_Favorites                       1.08e-06 ***
## feed_cattle_High                            0.000421 ***
## feed_cattle_Open                            0.017192 *  
## Gold_Close                                  0.021410 *  
## Holiday                                     0.010029 *  
## Iran_Favorites                              0.552307    
## Iran_Retweets                               0.132221    
## L(TSpread, c(1, 14, 28, 47))1               0.062489 .  
## L(TSpread, c(1, 14, 28, 47))14              0.027145 *  
## L(TSpread, c(1, 14, 28, 47))28              3.09e-07 ***
## L(TSpread, c(1, 14, 28, 47))47              0.104699    
## Iron_USA_Close                              0.119133    
## litecoin_Low                                0.011287 *  
## litecoin_Open                               0.018399 *  
## live_cattle_Close                           0.000637 ***
## live_cattle_Low                             0.008705 ** 
## monero_Close                                0.025237 *  
## monero_High                                 0.018535 *  
## monero_Open                                 0.171449    
## MXSPD                                       0.555339    
## nickel_Low                                  0.223126    
## nickel_Open                                 0.179468    
## platinum_Close                              0.000714 ***
## rice_Close                                  0.251515    
## rice_Open                                   0.138204    
## rouhani_Total_Posts                         0.098762 .  
## soybean_High                                0.225387    
## sugar_Close                                 0.007062 ** 
## TEMP                                        0.019301 *  
## tin_Close                                   0.006741 ** 
## tin_Open                                    0.024000 *  
## WDSP                                        0.160947    
## wheat_Low                                   0.091671 .  
## wheat_Open                                  0.070990 .  
## WTI_Close                                   0.023163 *  
## WTI_Low                                     0.028258 *  
## al_wafa_Total_Posts:Coalition14_Total_Posts 4.38e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.088 on 974 degrees of freedom
## Multiple R-squared:  0.5105, Adjusted R-squared:  0.4804 
## F-statistic: 16.93 on 60 and 974 DF,  p-value: < 2.2e-16
plot(VAR_EQ2)

test_df <- data.frame(predict(VAR_EQ2))

names(test_df)[1] <- 'Forecast'
test_df$Actual <- master_df[73:1107,223]

test_df_rmse <- sqrt(mean((test_df$Actual - test_df$Forecast)^2))
## [1] "RMSE of DYNLM is 2.02562082530908"

12. Univariate Time Series (Week 3 Lab)

Series.Dates<-as.Date(raw_df$event_date, "%d-%b-%y")
Series.Dates<-data.table(Series.Dates)
Series.By.Day<-Series.Dates[,.N,by=.(mday(Series.Dates),month(Series.Dates),year(Series.Dates))]

######## Reorganize and Reformat Dates 
Series.Daily.Record <-data.frame(paste(rev(Series.By.Day$month), '/',
                             rev(Series.By.Day$mday), '/',
                             rev(Series.By.Day$year)),
                             as.numeric(rev(Series.By.Day$N)))

colnames(Series.Daily.Record)<-c('Date', 'total_violent_events')
Series.Daily.Record$Date<-as.Date(as.character(Series.Daily.Record$Date),  #Convert Text to Date
                                  format="%m / %d /%Y")
tail(Series.Daily.Record)                 #Note the missing dates
##            Date total_violent_events
## 1237 2019-12-16                    1
## 1238 2019-12-17                    5
## 1239 2019-12-19                    2
## 1240 2019-12-20                    2
## 1241 2019-12-27                    3
## 1242 2019-12-29                    2
Date.Range<-seq(from = as.Date("2016-01-01"), to = as.Date("2019-12-31"), by = 'day')

Date.total_violent_events<-vector(mode="numeric", length=length(Date.Range))
for (i in 1:length(Date.Range)){
  if(Date.Range[i] %in% Series.Daily.Record$Date){
      Date.total_violent_events[i] <- Series.Daily.Record$total_violent_events[Series.Daily.Record$Date == Date.Range[i]]}
  else{Date.total_violent_events[i] <- 0}
}

Daily.Data.Frame <- data.frame(Date.Range, Date.total_violent_events)
tail(Daily.Data.Frame)#Confirm Problem Fixed
##      Date.Range Date.total_violent_events
## 1456 2019-12-26                         0
## 1457 2019-12-27                         3
## 1458 2019-12-28                         0
## 1459 2019-12-29                         2
## 1460 2019-12-30                         0
## 1461 2019-12-31                         0
Time<-paste(Daily.Data.Frame$Date.Range)
freq<-Date.total_violent_events
dates<-as.character(Date.Range)
plot.data<-data.frame(dates, freq)
n1 <- rCharts::Highcharts$new()
n1$yAxis(min=list(0),title=list(text='total_violent_events'))
n1$xAxis(categories = plot.data$dates, labels=list(rotation=-45, 
         align='right'), tickInterval=365, title=list(text='Date'))
n1$series(name='total_violent_events',type='column',data = as.list(plot.data$freq), 
          color = 'red')
n1$title(text = 'Daily Bahrain total_violent_events 01JAN16 - 31DEC19')
n1$chart(zoomType='x')  #Allows you to select sub-sections for zoom
n1
## <iframe src=' OA4106-Project_files/figure-html/unnamed-chunk-46-1.html ' scrolling='no' frameBorder='0' seamless class='rChart highcharts ' id=iframe- chart1f441783380f ></iframe> <style>iframe.rChart{ width: 100%; height: 400px;}</style>
#### Get Time Series Components for Seasonal Additive Model
Series.By.Day.TS<-ts(Daily.Data.Frame$Date.total_violent_events, start=c(2016,1,1), freq=7)
seasonal.model<-stl(Series.By.Day.TS, s.window="periodic")
plot(seasonal.model, main = 'Seasonal Decomposition (Bahrain Riots January 2016 - December 2019)')

seasonal.model.trend<-as.vector(seasonal.model$time.series[,2])
seasonal.model.seasons<-seasonal.model$time.series[,1][1:365]
seasonal.fit<-as.vector(seasonal.model.trend+as.vector(seasonal.model$time.series[,1]))
#head(seasonal.model$time.series)

# Remove zeros from dataframe
not.equal.to.zero.Series.By.Day.TS <- Series.By.Day.TS[Series.By.Day.TS != 0]
which.zeros <- which(Series.By.Day.TS == 0)
not.equal.to.zero.seasonal.fit <- seasonal.fit[-which.zeros]

Seasonal.Decomp.MAPE<-mean(abs(not.equal.to.zero.Series.By.Day.TS-not.equal.to.zero.seasonal.fit)/not.equal.to.zero.Series.By.Day.TS)

Seasonal.Decomp.MASE<-mean(abs(not.equal.to.zero.Series.By.Day.TS[366:1242]-not.equal.to.zero.seasonal.fit[366:1242]))/mean(abs(not.equal.to.zero.Series.By.Day.TS[366:1242]-not.equal.to.zero.Series.By.Day.TS[1:877]))
plot.data<-data.frame(Time, freq, seasonal.model.trend, seasonal.fit)
n3 <- rCharts::Highcharts$new()
n3$title(text="Daily Bahrain Violent Events January 2016 - December 2019")
n3$yAxis(min=list(0),title=list(text='Violent Events'))
n3$xAxis(categories = plot.data$Time, labels=list(rotation=-45, align='right'), tickInterval=24, title=list(text='Date'))
n3$series(name='Violent Event Count',type='column',data = as.list(plot.data$freq), color = 'red')
n3$series(name='Trend Line',type='line',data=as.list(plot.data$seasonal.model.trend), color='black')
n3$series(name='Fit Line',type='line',data=as.list(plot.data$seasonal.fit), color='grey')
n3$chart(zoomType='x')  #Allows you to select sub-sections for zoom
n3
## <iframe src=' OA4106-Project_files/figure-html/unnamed-chunk-47-1.html ' scrolling='no' frameBorder='0' seamless class='rChart highcharts ' id=iframe- chart1f4431c96b2c ></iframe> <style>iframe.rChart{ width: 100%; height: 400px;}</style>
## [1] "The MAPE of the seasonal model is 0.651029268908132"
## [1] "The MASE of the seasonal model is 0.397379374080167"
seasonal.forecast<-forecast(seasonal.model, method='naive')
plot.ts(seasonal.forecast$mean, ylim=c(0,45), lwd=2,       # Plot forecast
        main='Seasonal Model 2 Week Forecast')              
lines(seasonal.forecast$lower[,1], lwd=2, lty=2)            # 80% PI 
lines(seasonal.forecast$upper[,1], lwd=2, lty=2)            # 80% PI 
lines(seasonal.forecast$lower[,2], lwd=1, lty=2)            # 90% PI
lines(seasonal.forecast$upper[,2], lwd=1, lty=2)            # 90% PI

toJSONArray2 <- function(obj, json = TRUE, names = TRUE, ...){
  value = lapply(1:nrow(obj), function(i) {
    res <- as.list(obj[i, ])
    if (!names) names(res) <- NULL  # remove names (e.g. {x = 1, y = 2} => {1, 2})
    return(res)
  })
  if (json){
    return(toJSON(value, .withNames = F, ...))
  } else {
    names(value) <- NULL;
    return(value)
  }
}
### Create Series for Forecast Plots By Appending 'NA' data as needed
seasonal.forecast.mean<-c(rep(NA, length=length(Series.By.Day.TS)), seasonal.forecast$mean[1:14])
seasonal.forecast.lower<-c(rep(NA, length=length(Series.By.Day.TS)), seasonal.forecast$lower[1:14,1])
seasonal.forecast.upper<-c(rep(NA, length=length(Series.By.Day.TS)), seasonal.forecast$upper[1:14,1])
Time.Add1<-paste0("2020-01-0", rep(1:9, each = 1), setp = "") 
Time.Add2<-paste0("2020-01-", rep(10:14, each =1), setp = "")
Time.new<-c(Time, Time.Add1, Time.Add2)                   
freq.new<-c(Daily.Data.Frame$Date.total_violent_events, rep(NA, 14))
seasonal.fit.new<-c(seasonal.fit, rep(NA, 14))

plot.data<-transform(data.frame(Time.new, freq.new, seasonal.fit.new, seasonal.forecast.mean,
              seasonal.forecast.lower, seasonal.forecast.upper))
n2 <- rCharts::Highcharts$new()
n2$title(text="Daily Bahrain Violent Events January 2016 - December 2019")
n2$yAxis(min=list(0),title=list(text='Bahrain total_violent_events'))
n2$xAxis(categories = plot.data$Time.new, labels=list(rotation=-45, align='right'), tickInterval=365, title=list(text='Date'))
n2$series(
  name = '80% Forecast Interval',
  data = toJSONArray2(plot.data[,c('Time.new', 'seasonal.forecast.lower', 'seasonal.forecast.upper')], names = T, json = F),
  type = 'arearange',
  fillOpacity = 0.8,
  lineWidth = 0,
  color = 'lightgrey',
  zIndex = 0
)
n2$series(name='total_violent_events',type='column',data = as.list(plot.data$freq.new), color = 'red')
n2$series(name='Fit Line',type='line',data=as.list(plot.data$seasonal.fit.new), color='black')
n2$series(name='Forecast',type='line',data=as.list(plot.data$seasonal.forecast.mean), color='grey')
n2$chart(zoomType='x')  #Allows you to select sub-sections for zoom
n2
## <iframe src=' OA4106-Project_files/figure-html/unnamed-chunk-52-1.html ' scrolling='no' frameBorder='0' seamless class='rChart highcharts ' id=iframe- chart1f44289c3cea ></iframe> <style>iframe.rChart{ width: 100%; height: 400px;}</style>
HW.forecast <- HoltWinters(Series.By.Day.TS, alpha = TRUE, beta = TRUE, gamma = FALSE)

HW.predict <- predict(HW.forecast, prediction.interval = TRUE)
plot(HW.forecast, HW.predict)

Training.Series<-list()
Seasonal.model<-list()
Seasonal.Forecast<-list()
Naive.Forecast<-list()
Observed<-list()
Seasonal.Forecast.Errors<-list()
Naive.Forecast.Errors<-list()
for(i in 1447:1461){
  Training.Series[[i]]<-ts(Series.By.Day.TS[1:(i-1)], start=c(2010,1, 1), freq=7)
  Naive.Forecast[[i]]<-as.vector(Series.By.Day.TS[(i-14):(i-1)])  #14 day forecast
  Seasonal.model[[i]]<-stl(Training.Series[[i]], s.window="periodic")
  Seasonal.Forecast[[i]]<-as.vector(forecast(Seasonal.model[[i]], method='naive')$mean)[1:14]
  Observed[[i]]<-as.vector(Series.By.Day.TS[i:(i+13)])
  Naive.Forecast.Errors[[i]]<-Observed[[i]]-Naive.Forecast[[i]] #14 days of error
  Seasonal.Forecast.Errors[[i]]<-Seasonal.Forecast[[i]]-Observed[[i]]
}
### Seasonal Model Performance Evaluation
Seasonal.Error.Table<-do.call(rbind, Seasonal.Forecast.Errors)
Naive.Error.Table<-do.call(rbind, Naive.Forecast.Errors)
Observed.Table<-do.call(rbind, Observed)
Seasonal.1Step.MAPE<-mean(abs(Seasonal.Error.Table[,1])/Observed.Table[,1])
Seasonal.1Yr.MAPE<-mean(abs(Seasonal.Error.Table[1:2,])/Observed.Table[1:2,])
Seasonal.1Step.MASE<-mean(abs(Seasonal.Error.Table[,1]))/mean(abs(Naive.Error.Table[,1]))
Seasonal.1Yr.MASE<-mean(abs(Seasonal.Error.Table[1:2,]))/mean(abs(Naive.Error.Table[1:2,]))
Seasonal.Decomp.MASE<-mean(abs(Series.By.Day.TS[365:1461]-seasonal.fit[365:1461]))/mean(abs(Series.By.Day.TS[365:1461]-Series.By.Day.TS[1:1097]))
Seasonal.Fitted.Performance<-c(Seasonal.Decomp.MAPE, Seasonal.Decomp.MASE)
Seasonal.Forecast.Performance.1<-c(Seasonal.1Yr.MAPE, Seasonal.1Yr.MASE)
# One Step Ahead Rolling Horizon Prediction
Training.Series<-list()
Naive.Forecast<-list()
Seasonal.model<-list()
Seasonal.Forecast<-list()
ARIMA.model<-list()
ARIMA.Forecast<-list()
HW.model<-list()
HW.Forecast<-list()
Observed<-list()
Naive.Forecast.Errors<-list()
HW.Forecast.Errors<-list()
ARIMA.Forecast.Errors<-list()
for(i in 1447:1461){    # Need 3 years of observation for setup of seasonal models 
  Training.Series[[i]]<-ts(Series.By.Day.TS[1:(i-1)], start=c(2016,1, 1), freq=7)
  Naive.Forecast[[i]]<-as.vector(Series.By.Day.TS[(i-14):(i-1)])  #14 day forecast
  HW.model[[i]]<-HoltWinters(Training.Series[[i]])
  HW.Forecast[[i]]<-forecast(HW.model[[i]])$mean[1:14]  #14 day error
  Seasonal.model[[i]]<-stl(Training.Series[[i]], s.window="periodic")
  Seasonal.Forecast[[i]]<-as.vector(forecast(Seasonal.model[[i]], method='naive')$mean)[1:14]  
  ARIMA.model[[i]]<-auto.arima(Training.Series[[i]])
  ARIMA.Forecast[[i]]<-forecast(ARIMA.model[[i]])$mean[1:14]  #14 day forecast
  Observed[[i]]<-as.vector(Series.By.Day.TS[i:(i+13)])  #next 14 day observations
  Seasonal.Forecast.Errors[[i]]<-Seasonal.Forecast[[i]]-Observed[[i]]
  Naive.Forecast.Errors[[i]]<-Observed[[i]]-Naive.Forecast[[i]] #14 days of error
  HW.Forecast.Errors[[i]]<-Observed[[i]]-HW.Forecast[[i]]
  ARIMA.Forecast.Errors[[i]]<-Observed[[i]]-ARIMA.Forecast[[i]]
  #print(i)  #so we can see progress
}
ARIMA.1Step.Forecast<-do.call(rbind, ARIMA.Forecast)[,1]
HW.1Step.Forecast<-do.call(rbind, HW.Forecast)[,1]
Seasonal.1Step.Forecast<-do.call(rbind, Seasonal.Forecast)[,1]
Naive.1Step.Forecast<-do.call(rbind, Naive.Forecast)[,1]
##### HighChart of Forecasting Model 1-Step Ahead Performance
Time<-Time[1447:1461]
freq<-Daily.Data.Frame$Date.total_violent_events[1447:1461]
x<-data.frame(Time, freq,  Seasonal.1Step.Forecast, HW.1Step.Forecast, ARIMA.1Step.Forecast)
n5 <- rCharts::Highcharts$new()
n5$yAxis(min=list(0),title=list(text='Bahrain Violent Events'))
n5$xAxis(categories = x$Time, labels=list(rotation=-45, align='right'), tickInterval=24, title=list(text='Date'))
n5$series(name='Violent Event Count',type='column',data = as.list(x$freq), color = 'red')
n5$series(name='Seasonal',type='line',data=as.list(x$Seasonal.1Step.Forecast), color='grey')
n5$series(name='ARIMA',type='line',data=as.list(x$ARIMA.1Step.Forecast), color='blue')
n5$series(name='Holt-Winters',type='line',data=as.list(x$HW.1Step.Forecast), color='green')
n5$chart(zoomType='x')  #Allows you to select sub-sections for zoom
n5
## <iframe src=' OA4106-Project_files/figure-html/unnamed-chunk-57-1.html ' scrolling='no' frameBorder='0' seamless class='rChart highcharts ' id=iframe- chart1f444e0c3695 ></iframe> <style>iframe.rChart{ width: 100%; height: 400px;}</style>
### Calculate MAPE Performance for 1 Step and 1 YR Forecasts
Observed.Table<-do.call(rbind, Observed)
Naive.Error.Table<-do.call(rbind, Naive.Forecast.Errors)
HW.Error.Table<-do.call(rbind, HW.Forecast.Errors)
ARIMA.Error.Table<-do.call(rbind, ARIMA.Forecast.Errors)
index <- which(Observed.Table[,1] == 0)
index2 <- which(Observed.Table[1,] == 0)
index3 <- which(Observed.Table[2,] == 0)

Naive.1Step.MAPE<-mean(abs(Naive.Error.Table[-index,1])/Observed.Table[-index,1])
Naive.1Yr.MAPE<-mean(c(abs(Naive.Error.Table[1,-index2])/Observed.Table[1,-index2], abs(Naive.Error.Table[2,-index3])/Observed.Table[2,-index3]))
Seasonal.1Step.MAPE<-mean(abs(Seasonal.Error.Table[-index,1])/Observed.Table[-index,1])
Seasonal.1Yr.MAPE<-mean(c(abs(Seasonal.Error.Table[1,-index2])/Observed.Table[1,-index2], abs(Seasonal.Error.Table[2,-index3])/Observed.Table[2,-index3]))  

HW.1Step.MAPE<-mean(abs(HW.Error.Table[-index,1])/Observed.Table[-index,1])
HW.1Yr.MAPE<-mean(c(abs(HW.Error.Table[1,-index2])/Observed.Table[1,-index2], abs(HW.Error.Table[2,-index3])/Observed.Table[2,-index3]))  
ARIMA.1Step.MAPE<-mean(abs(ARIMA.Error.Table[-index,1])/Observed.Table[-index,1])
ARIMA.1Yr.MAPE<-mean(c(abs(ARIMA.Error.Table[1,-index2])/Observed.Table[1,-index2], abs(ARIMA.Error.Table[2,-index3])/Observed.Table[2,-index3]))

####### Calculate MASE Performance
Seasonal.1Step.MASE<-mean(abs(Seasonal.Error.Table[,1]))/mean(abs(Naive.Error.Table[,1]))
Seasonal.1Yr.MASE<-mean(abs(Seasonal.Error.Table[1:2,]))/mean(abs(Naive.Error.Table[1:2,]))
HW.1Step.MASE<-mean(abs(HW.Error.Table[,1]))/mean(abs(Naive.Error.Table[,1]))
HW.1Yr.MASE<-mean(abs(HW.Error.Table[1:2,]))/mean(abs(Naive.Error.Table[1:2,]))
ARIMA.1Step.MASE<-mean(abs(ARIMA.Error.Table[,1]))/mean(abs(Naive.Error.Table[,1]))
ARIMA.1Yr.MASE<-mean(abs(ARIMA.Error.Table[1:2,]))/mean(abs(Naive.Error.Table[1:2,]))

# Create the table rows
Naive.Performance<-cbind(Naive.1Step.MAPE, Naive.1Yr.MAPE, 1, 1)
Seasonal.Performance<-cbind(Seasonal.1Step.MAPE, Seasonal.1Yr.MAPE, Seasonal.1Step.MASE, Seasonal.1Yr.MASE)
HW.Performance<-cbind(HW.1Step.MAPE, HW.1Yr.MAPE, HW.1Step.MASE, HW.1Yr.MASE)
ARIMA.Performance<-cbind(ARIMA.1Step.MAPE, ARIMA.1Yr.MAPE, ARIMA.1Step.MASE, ARIMA.1Yr.MASE)

# Create the table for display
Performance.Matrix1<-rbind(Naive.Performance, Seasonal.Performance, HW.Performance, ARIMA.Performance)
Performance.Matrix1<-signif(Performance.Matrix1, 2)
Performance.Matrix1<-cbind(c("Naive", "Seasonal Decomposition", "Holt-Winters", "ARIMA"), Performance.Matrix1)
Performance.Matrix1<-data.frame(Performance.Matrix1)
colnames(Performance.Matrix1)<-c("Model", "1 Step MAPE", "12 Step MAPE", "1 Step MASE", "12 Step MASE")
Performance.Matrix1<-knitr::kable(Performance.Matrix1, digits=2) #prints a pretty table
Performance.Matrix1
Model 1 Step MAPE 12 Step MAPE 1 Step MASE 12 Step MASE
Naive 0.57 0.52 1 1
Seasonal Decomposition 1.2 1.8 2.1 5.2
Holt-Winters 0.69 0.59 1.1 2.2
ARIMA 0.65 0.53 1.3 1.6
# Build and Evaluate an Ensemble Forecast
Seasonal.Forecast.Table<-do.call(rbind, Seasonal.Forecast)
HW.Forecast.Table<-do.call(rbind, HW.Forecast)
ARIMA.Forecast.Table<-do.call(rbind, ARIMA.Forecast)
Ensemble.Forecast.Table<-(Seasonal.Forecast.Table + HW.Forecast.Table + ARIMA.Forecast.Table)/3
Ensemble.1Step.Forecast<-Ensemble.Forecast.Table[,1]
Ensemble.Error.Table<-Observed.Table-Ensemble.Forecast.Table
Ensemble.1Step.MAPE<-mean(abs(Ensemble.Error.Table[-index,1])/Observed.Table[-index,1]) 
Ensemble.1Yr.MAPE<-mean(abs(Ensemble.Error.Table[1,-index2])/Observed.Table[1,-index2]) 
Ensemble.1Step.MASE<-mean(abs(Ensemble.Error.Table[,1]))/mean(abs(Naive.Error.Table[,1]))
Ensemble.1Yr.MASE<-mean(abs(Ensemble.Error.Table[1:2,]))/mean(abs(Naive.Error.Table[1:2,]))

# Building rows of table again
Naive.Performance<-cbind(Naive.1Step.MAPE, Naive.1Yr.MAPE, 1, 1)
Seasonal.Performance<-cbind(Seasonal.1Step.MAPE, Seasonal.1Yr.MAPE, Seasonal.1Step.MASE, Seasonal.1Yr.MASE)
HW.Performance<-cbind(HW.1Step.MAPE, HW.1Yr.MAPE, HW.1Step.MASE, HW.1Yr.MASE)
ARIMA.Performance<-cbind(ARIMA.1Step.MAPE, ARIMA.1Yr.MAPE, ARIMA.1Step.MASE, ARIMA.1Yr.MASE)
Ensemble.Performance<-cbind(Ensemble.1Step.MAPE, Ensemble.1Yr.MAPE, Ensemble.1Step.MASE, Ensemble.1Yr.MASE)

# Building full table again
Performance.Matrix<-rbind(Naive.Performance, Seasonal.Performance, HW.Performance, ARIMA.Performance, Ensemble.Performance)
Performance.Matrix<-signif(Performance.Matrix, 2)
Performance.Matrix<-cbind(c("Naive", "Seasonal Decomposition", "Holt-Winters", "ARIMA", 
                            "Ensemble"), Performance.Matrix)
Performance.Matrix<-data.frame(Performance.Matrix)
colnames(Performance.Matrix)<-c("Model", "1 Step MAPE", "12 Step MAPE", "1 Step MASE", "12 Step MASE")
Performance.Matrix<-knitr::kable(Performance.Matrix, digits=2, caption='Model Performance Comparison')
Performance.Matrix
Model Performance Comparison
Model 1 Step MAPE 12 Step MAPE 1 Step MASE 12 Step MASE
Naive 0.57 0.52 1 1
Seasonal Decomposition 1.2 1.8 2.1 5.2
Holt-Winters 0.69 0.59 1.1 2.2
ARIMA 0.65 0.53 1.3 1.6
Ensemble 0.75 0.61 1.1 2.8

We can see from the performance matrix that the MAPE is above 1 for a model, which would normally be an issue. However, since we eliminated values that are equal to zero we are not getting an accurate depiction of the prediction power.

n5$series(name='Ensemble',type='line',data=as.list(Ensemble.1Step.Forecast), color='black')
n5
## <iframe src=' OA4106-Project_files/figure-html/unnamed-chunk-60-1.html ' scrolling='no' frameBorder='0' seamless class='rChart highcharts ' id=iframe- chart1f444e0c3695 ></iframe> <style>iframe.rChart{ width: 100%; height: 400px;}</style>

13. Rolling Horizon/Random Forest

# Read in the master_df.csv that contains predictors and response
master_df <- read.csv('master_df.csv', header = TRUE)

d <- dim(master_df)

master_df1 <- master_df[,c(1, 5, 26, 29, 47, 68, 223)]

summary(master_df1)
##   event_date        al_wafa_Total_Posts Coalition14_Total_Posts
##  Length:1461        Min.   :  0.00      Min.   :  0.00         
##  Class :character   1st Qu.:  1.00      1st Qu.: 15.00         
##  Mode  :character   Median :  7.00      Median : 26.00         
##                     Mean   : 14.24      Mean   : 28.84         
##                     3rd Qu.: 22.00      3rd Qu.: 39.00         
##                     Max.   :220.00      Max.   :199.00         
##  duraz_youth_Total_Posts khamenei_Total_Posts trump_Total_Posts
##  Min.   :  0.00          Min.   : 0.000       Min.   : 0.000   
##  1st Qu.:  2.00          1st Qu.: 0.000       1st Qu.: 5.000   
##  Median :  8.00          Median : 0.000       Median : 8.000   
##  Mean   : 14.49          Mean   : 1.576       Mean   : 9.424   
##  3rd Qu.: 20.00          3rd Qu.: 2.000       3rd Qu.:13.000   
##  Max.   :183.00          Max.   :39.000       Max.   :76.000   
##    riot_count    
##  Min.   : 0.000  
##  1st Qu.: 0.000  
##  Median : 1.000  
##  Mean   : 2.216  
##  3rd Qu.: 3.000  
##  Max.   :29.000
Series.Dates<-as.Date(master_df1$event_date, "%m/%d/%Y")
first_date <- min(Series.Dates)
last_date <- max(Series.Dates)
first_date
## [1] "2016-01-01"
last_date
## [1] "2019-12-31"
full_dates <- data.table(
  Series.Dates = seq(first_date, last_date, "1 day")
)

length(Series.Dates)
## [1] 1461
dim(full_dates)
## [1] 1461    1

Now we remove the column for dates:

master_df_new <- master_df1[, -1]
summary(master_df_new)
##  al_wafa_Total_Posts Coalition14_Total_Posts duraz_youth_Total_Posts
##  Min.   :  0.00      Min.   :  0.00          Min.   :  0.00         
##  1st Qu.:  1.00      1st Qu.: 15.00          1st Qu.:  2.00         
##  Median :  7.00      Median : 26.00          Median :  8.00         
##  Mean   : 14.24      Mean   : 28.84          Mean   : 14.49         
##  3rd Qu.: 22.00      3rd Qu.: 39.00          3rd Qu.: 20.00         
##  Max.   :220.00      Max.   :199.00          Max.   :183.00         
##  khamenei_Total_Posts trump_Total_Posts   riot_count    
##  Min.   : 0.000       Min.   : 0.000    Min.   : 0.000  
##  1st Qu.: 0.000       1st Qu.: 5.000    1st Qu.: 0.000  
##  Median : 0.000       Median : 8.000    Median : 1.000  
##  Mean   : 1.576       Mean   : 9.424    Mean   : 2.216  
##  3rd Qu.: 2.000       3rd Qu.:13.000    3rd Qu.: 3.000  
##  Max.   :39.000       Max.   :76.000    Max.   :29.000

Setting up for a rolling horizon model.

### Define s
s <- 1

### Initial train set sample size
i.train.sample <- 700

### One time period
p <- 10

### Define t
t <- i.train.sample + s

### Defining a train set
# Define predictors
## For time t - 1
al_wafa_Total_Posts1 <- master_df_new[1:(t-1), 1]
Coalition14_Total_Posts1 <- master_df_new[1:(t-1), 2]
duraz_youth_Total_Posts1 <- master_df_new[1:(t-1), 3]
khamenei_Total_Posts1 <- master_df_new[1:(t-1), 4]
trump_Total_Posts1 <- master_df_new[1:(t-1), 5]
count1 <- master_df_new[1:(t-1), 6]

## For time t
al_wafa_Total_Posts2 <- master_df_new[2:t, 1]
Coalition14_Total_Posts2 <- master_df_new[2:t, 2]
duraz_youth_Total_Posts2 <- master_df_new[2:t, 3]
khamenei_Total_Posts2 <- master_df_new[2:t, 4]
trump_Total_Posts2 <- master_df_new[2:t, 5]
count2 <- master_df_new[2:t, 6]

# Now define the target
count3 <- master_df_new[(2+s):(t+s), 6]

train <- data.frame(al_wafa_Total_Posts1, Coalition14_Total_Posts1, duraz_youth_Total_Posts1, khamenei_Total_Posts1, trump_Total_Posts1, count1, al_wafa_Total_Posts2, Coalition14_Total_Posts2, duraz_youth_Total_Posts2, khamenei_Total_Posts2, trump_Total_Posts2, count2, count3)
summary(train)
##  al_wafa_Total_Posts1 Coalition14_Total_Posts1 duraz_youth_Total_Posts1
##  Min.   :  0.00       Min.   :  9.00           Min.   :  1.00          
##  1st Qu.: 12.00       1st Qu.: 29.00           1st Qu.: 10.00          
##  Median : 22.00       Median : 38.00           Median : 18.00          
##  Mean   : 25.94       Mean   : 41.94           Mean   : 24.95          
##  3rd Qu.: 32.00       3rd Qu.: 50.25           3rd Qu.: 33.00          
##  Max.   :220.00       Max.   :199.00           Max.   :183.00          
##  khamenei_Total_Posts1 trump_Total_Posts1     count1       al_wafa_Total_Posts2
##  Min.   : 0.000        Min.   : 0.000     Min.   : 0.000   Min.   :  0.00      
##  1st Qu.: 0.000        1st Qu.: 4.000     1st Qu.: 1.000   1st Qu.: 12.00      
##  Median : 0.000        Median : 7.000     Median : 2.000   Median : 22.00      
##  Mean   : 1.927        Mean   : 8.587     Mean   : 2.969   Mean   : 25.98      
##  3rd Qu.: 2.000        3rd Qu.:11.000     3rd Qu.: 4.000   3rd Qu.: 32.00      
##  Max.   :39.000        Max.   :76.000     Max.   :29.000   Max.   :220.00      
##  Coalition14_Total_Posts2 duraz_youth_Total_Posts2 khamenei_Total_Posts2
##  Min.   :  9.00           Min.   :  1.00           Min.   : 0.000       
##  1st Qu.: 29.00           1st Qu.: 10.00           1st Qu.: 0.000       
##  Median : 38.00           Median : 18.00           Median : 0.000       
##  Mean   : 41.86           Mean   : 24.94           Mean   : 1.929       
##  3rd Qu.: 50.00           3rd Qu.: 33.00           3rd Qu.: 2.000       
##  Max.   :199.00           Max.   :183.00           Max.   :39.000       
##  trump_Total_Posts2     count2           count3      
##  Min.   : 0.000     Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.000     1st Qu.: 1.000   1st Qu.: 1.000  
##  Median : 7.000     Median : 2.000   Median : 2.000  
##  Mean   : 8.579     Mean   : 2.971   Mean   : 2.964  
##  3rd Qu.:11.000     3rd Qu.: 4.000   3rd Qu.: 4.000  
##  Max.   :76.000     Max.   :29.000   Max.   :29.000
### Defining a test set
# Define predictors
## For time t - 1
al_wafa_Total_Posts1 <- master_df_new[t:(t+p-1), 1]
Coalition14_Total_Posts1 <- master_df_new[t:(t+p-1), 2]
duraz_youth_Total_Posts1 <- master_df_new[t:(t+p-1), 3]
khamenei_Total_Posts1 <- master_df_new[t:(t+p-1), 4]
trump_Total_Posts1 <- master_df_new[t:(t+p-1), 5]
count1 <- master_df_new[t:(t+p-1), 6]

## For time t
al_wafa_Total_Posts2 <- master_df_new[(t+1):(t+p), 1]
Coalition14_Total_Posts2 <- master_df_new[(t+1):(t+p), 2]
duraz_youth_Total_Posts2 <- master_df_new[(t+1):(t+p), 3]
khamenei_Total_Posts2 <- master_df_new[(t+1):(t+p), 4]
trump_Total_Posts2 <- master_df_new[(t+1):(t+p), 5]
count2 <- master_df_new[(t+1):(t+p), 6]

# Now define the target
count3 <- master_df_new[(t+1+s):(t+p+s), 6]

test <- data.frame(al_wafa_Total_Posts1, Coalition14_Total_Posts1, duraz_youth_Total_Posts1, khamenei_Total_Posts1, trump_Total_Posts1, count1, al_wafa_Total_Posts2, Coalition14_Total_Posts2, duraz_youth_Total_Posts2, khamenei_Total_Posts2, trump_Total_Posts2, count2, count3)
summary(test)
##  al_wafa_Total_Posts1 Coalition14_Total_Posts1 duraz_youth_Total_Posts1
##  Min.   : 5.00        Min.   :11.00            Min.   : 3.00           
##  1st Qu.:13.25        1st Qu.:23.50            1st Qu.:13.25           
##  Median :17.50        Median :32.50            Median :20.00           
##  Mean   :20.40        Mean   :32.00            Mean   :18.30           
##  3rd Qu.:25.75        3rd Qu.:37.75            3rd Qu.:24.75           
##  Max.   :49.00        Max.   :52.00            Max.   :30.00           
##  khamenei_Total_Posts1 trump_Total_Posts1     count1     al_wafa_Total_Posts2
##  Min.   :0.0           Min.   : 2.0       Min.   :0.00   Min.   : 5.00       
##  1st Qu.:0.0           1st Qu.: 4.0       1st Qu.:2.25   1st Qu.:13.25       
##  Median :0.5           Median : 5.0       Median :3.00   Median :16.50       
##  Mean   :1.3           Mean   : 6.3       Mean   :3.20   Mean   :17.20       
##  3rd Qu.:1.0           3rd Qu.: 8.0       3rd Qu.:4.00   3rd Qu.:21.25       
##  Max.   :9.0           Max.   :13.0       Max.   :6.00   Max.   :29.00       
##  Coalition14_Total_Posts2 duraz_youth_Total_Posts2 khamenei_Total_Posts2
##  Min.   :11.00            Min.   : 3.00            Min.   :0.0          
##  1st Qu.:23.00            1st Qu.:12.50            1st Qu.:0.0          
##  Median :27.00            Median :17.00            Median :0.0          
##  Mean   :30.70            Mean   :16.70            Mean   :1.2          
##  3rd Qu.:37.75            3rd Qu.:23.75            3rd Qu.:1.0          
##  Max.   :52.00            Max.   :28.00            Max.   :9.0          
##  trump_Total_Posts2     count2         count3    
##  Min.   : 1.0       Min.   :0.00   Min.   :0.00  
##  1st Qu.: 4.0       1st Qu.:2.25   1st Qu.:2.25  
##  Median : 4.0       Median :3.00   Median :3.00  
##  Mean   : 5.6       Mean   :3.00   Mean   :2.80  
##  3rd Qu.: 7.5       3rd Qu.:4.00   3rd Qu.:3.75  
##  Max.   :13.0       Max.   :5.00   Max.   :4.00
## Applying RF

RandomForest <- randomForest(count3 ~ ., data=train, importance = TRUE, ntrees = 500)
predRF <- predict(RandomForest, newdata = test, type = "response")
summary(predRF)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.929   2.286   2.481   2.893   3.496   4.700
## MSE
mse <- sum((predRF - test$count3)^2)/length(test$count3)
mse
## [1] 2.672367
## MAPE
## Finding zero count
index <- which(test$count3 == 0)
## MAPE
mape <- sum(abs(predRF[-index] - test$count3[-index]) / test$count3[-index])/length(test$count3[-index])
## [1] "MASE is 2.6724"
## [1] "MAPE is 0.3701"

First, we plot how it did with the train set. Black line is the truth and red line is the prediction.

plot(1:length(train$count3), train$count3, type="l")
lines(1:length(RandomForest$predicted), round(RandomForest$predicted), col = "red")

Now we plot how it did with the test set. Black line is the truth and red line is the prediction.

plot(1:length(test$count3), test$count3, type="l")
lines(1:length(predRF), round(predRF), col = "red")

Rolling Holizon Design

We will do the rolling horizon model.

RF.MSE <- list()
RF.MAPE <- list()
RF.MSE[[1]] <- mse
RF.MAPE[[1]] <- mape
k <- 1

for(t in (i.train.sample + s + 1):(d[1]-p-s)){
#for(t in (i.train.sample + s + 1):(i.train.sample + s + 11)){
  k <- k + 1
  ### Defining a train set
  # Define predictors
  ## For time t - 1
  al_wafa_Total_Posts1 <- master_df_new [1:(t-1), 1]
  Coalition14_Total_Posts1 <- master_df_new [1:(t-1), 2]
  duraz_youth_Total_Posts1 <- master_df_new [1:(t-1), 3]
  khamenei_Total_Posts1 <- master_df_new [1:(t-1), 4]
  trump_Total_Posts1 <- master_df_new [1:(t-1), 5]
  riot_count1 <- master_df_new [1:(t-1), 6]

  ## For time t
  al_wafa_Total_Posts2 <- master_df_new [2:t, 1]
  Coalition14_Total_Posts2 <- master_df_new [2:t, 2]
  duraz_youth_Total_Posts2 <- master_df_new [2:t, 3]
  khamenei_Total_Posts2 <- master_df_new [2:t, 4]
  trump_Total_Posts2 <- master_df_new [2:t, 5]
  riot_count2 <- master_df_new [2:t, 6]

  # Now define the target
  riot_count3 <- master_df_new [(2+s):(t+s), 6]

  train <- data.frame(al_wafa_Total_Posts1, Coalition14_Total_Posts1, duraz_youth_Total_Posts1, khamenei_Total_Posts1, trump_Total_Posts1, riot_count1, al_wafa_Total_Posts2, Coalition14_Total_Posts2, duraz_youth_Total_Posts2, khamenei_Total_Posts2, trump_Total_Posts2, riot_count2, riot_count3)
  summary(train)

  ### Defining a test set
  # Define predictors
  ## For time t - 1
  al_wafa_Total_Posts1 <- master_df_new [t:(t+p-1), 1]
  Coalition14_Total_Posts1 <- master_df_new [t:(t+p-1), 2]
  duraz_youth_Total_Posts1 <- master_df_new [t:(t+p-1), 3]
  khamenei_Total_Posts1 <- master_df_new [t:(t+p-1), 4]
  trump_Total_Posts1 <- master_df_new [t:(t+p-1), 5]
  riot_count1 <- master_df_new [t:(t+p-1), 6]

  ## For time t
  al_wafa_Total_Posts2 <- master_df_new [(t+1):(t+p), 1]
  Coalition14_Total_Posts2 <- master_df_new [(t+1):(t+p), 2]
  duraz_youth_Total_Posts2 <- master_df_new [(t+1):(t+p), 3]
  khamenei_Total_Posts2 <- master_df_new [(t+1):(t+p), 4]
  trump_Total_Posts2 <- master_df_new [(t+1):(t+p), 5]
  riot_count2 <- master_df_new [(t+1):(t+p), 6]

  # Now define the target
  riot_count3 <- master_df_new [(t+1+s):(t+p+s), 6]

  test <- data.frame(al_wafa_Total_Posts1, Coalition14_Total_Posts1, duraz_youth_Total_Posts1, khamenei_Total_Posts1, trump_Total_Posts1, riot_count1, al_wafa_Total_Posts2, Coalition14_Total_Posts2, duraz_youth_Total_Posts2, khamenei_Total_Posts2, trump_Total_Posts2, riot_count2, riot_count3)
summary(test)

  ## Applying RF

  RandomForest <- randomForest(riot_count3 ~ ., data=train, importance = TRUE, ntrees = 500)
  predRF <- predict(RandomForest, newdata = test, type = "response")
  summary(predRF)
  ## MSE
  mse <- sum((predRF - test$riot_count3)^2)/length(test$riot_count3)
  mse
  ## MAPE
  ## Finding zero riot_count
  index <- which(test$riot_count3 == 0)
  ## MAPE
  mape <- sum(abs(predRF[-index] - test$riot_count3[-index])/test$riot_count3[-index])/length(test$riot_count3[-index])
  mape
  RF.MSE[[k]] <- mse
  RF.MAPE[[k]] <- mape
}

### Computing the overall MSE and MAPE
RF.MSE.Table <- do.call(rbind, RF.MSE)
RF.MAPE.Table <- do.call(rbind, RF.MAPE)
## [1] "The overal MASE is 4.3578"
## [1] "The overal MAPE is 0.5801"

First, we plot how it did with the train set. Black line is the truth and red line is the prediction.

plot(1:length(train$riot_count3), train$riot_count3, type="l", main = 'Rolling Horizon Bahrain Riots (01JAN16 - 31DEC19)', xlab = 'Date', ylab = 'Count')
lines(1:length(RandomForest$predicted), RandomForest$predicted, col = "red")
legend(900, 29, legend=c("True Count", "Rolling Hoizon/RF Forecast"),
       col=c("black", "red"),  lty=1:1, cex=0.8)

Now we plot how it did with the test set. Black line is the truth and red line is the prediction.

plot(1:length(test$riot_count3), test$riot_count3, type="l", main = 'Rolling Horizon Bahrain Riots (Test Set)', xlab = 'Date', ylab = 'Count')
lines(1:length(predRF), round(predRF), col = "red")
legend(1, 2, legend=c("True Count", "Rolling Hoizon/RF Forecast"),
       col=c("black", "red"),  lty=1:1, cex=0.8)

14 PCA & RegSubsets/Random Forest/Rolling Horizon

The team utilizes a reg Subsets to determine the optimal model with our new PCA value and we will utilize all explanatory variables.

# Read in the file
master_df <- read.csv('master_df.csv')

# Determine dimensions of dataframe
d <- dim(master_df) 

# Set a boundary to capture ACLED data
s <- (d[2]-33):d[2] 

# Create a subset of the data
events <- master_df[, s] 

# Run a summary
summary(events) 
##  Armed_clash_count  Armed_clash_fatalities  battle_count     
##  Min.   :0.000000   Min.   :0.000000       Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000       1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000       Median :0.000000  
##  Mean   :0.004791   Mean   :0.002738       Mean   :0.004791  
##  3rd Qu.:0.000000   3rd Qu.:0.000000       3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :3.000000       Max.   :1.000000  
##  battle_fatalities    SAMA_count        SAMA_fatalities  RELIED_count     
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0       Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0       1st Qu.:0.000000  
##  Median :0.000000   Median :0.0000000   Median :0       Median :0.000000  
##  Mean   :0.002738   Mean   :0.0006845   Mean   :0       Mean   :0.008898  
##  3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0       3rd Qu.:0.000000  
##  Max.   :3.000000   Max.   :1.0000000   Max.   :0       Max.   :1.000000  
##  RELIED_fatalities  Grenade_count       Grenade_fatalities explosion_count  
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0          Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0          1st Qu.:0.00000  
##  Median :0.000000   Median :0.0000000   Median :0          Median :0.00000  
##  Mean   :0.002738   Mean   :0.0006845   Mean   :0          Mean   :0.01027  
##  3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0          3rd Qu.:0.00000  
##  Max.   :1.000000   Max.   :1.0000000   Max.   :0          Max.   :1.00000  
##  explosions_fatalities Sexual_violence_count sexual_violence_fatalities
##  Min.   :0.000000      Min.   :0.000000      Min.   :0                 
##  1st Qu.:0.000000      1st Qu.:0.000000      1st Qu.:0                 
##  Median :0.000000      Median :0.000000      Median :0                 
##  Mean   :0.002738      Mean   :0.004107      Mean   :0                 
##  3rd Qu.:0.000000      3rd Qu.:0.000000      3rd Qu.:0                 
##  Max.   :1.000000      Max.   :2.000000      Max.   :0                 
##   Attack_count     Attack_fatalities    VAC_count       VAC_fatalities    
##  Min.   :0.00000   Min.   :0.000000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.00000   Median :0.000000   Median :0.00000   Median :0.000000  
##  Mean   :0.00616   Mean   :0.002053   Mean   :0.01027   Mean   :0.002053  
##  3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.000000  
##  Max.   :1.00000   Max.   :1.000000   Max.   :2.00000   Max.   :1.000000  
##    EFAP_count       EFAP_fatalities      PWI_count      PWI_fatalities
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.0000   Min.   :0     
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.0000   1st Qu.:0     
##  Median :0.000000   Median :0.000000   Median :0.0000   Median :0     
##  Mean   :0.008214   Mean   :0.002053   Mean   :0.1951   Mean   :0     
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.0000   3rd Qu.:0     
##  Max.   :2.000000   Max.   :2.000000   Max.   :8.0000   Max.   :0     
##  Peaceful_protest_count Peaceful_protest_fatalities protest_count   
##  Min.   : 0.000         Min.   :0                   Min.   : 0.000  
##  1st Qu.: 0.000         1st Qu.:0                   1st Qu.: 0.000  
##  Median : 2.000         Median :0                   Median : 2.000  
##  Mean   : 3.606         Mean   :0                   Mean   : 3.809  
##  3rd Qu.: 5.000         3rd Qu.:0                   3rd Qu.: 5.000  
##  Max.   :39.000         Max.   :0                   Max.   :42.000  
##  protest_fatalities mob_violence_count mob_violence_fatalities
##  Min.   :0.000000   Min.   :0.0000     Min.   :0.000000       
##  1st Qu.:0.000000   1st Qu.:0.0000     1st Qu.:0.000000       
##  Median :0.000000   Median :0.0000     Median :0.000000       
##  Mean   :0.002053   Mean   :0.6454     Mean   :0.001369       
##  3rd Qu.:0.000000   3rd Qu.:1.0000     3rd Qu.:0.000000       
##  Max.   :2.000000   Max.   :9.0000     Max.   :1.000000       
##  violent_demonstration_count violent_demonstration_fatalities   riot_count    
##  Min.   : 0.00               Min.   :0.000000                 Min.   : 0.000  
##  1st Qu.: 0.00               1st Qu.:0.000000                 1st Qu.: 0.000  
##  Median : 1.00               Median :0.000000                 Median : 1.000  
##  Mean   : 1.57               Mean   :0.003422                 Mean   : 2.216  
##  3rd Qu.: 2.00               3rd Qu.:0.000000                 3rd Qu.: 3.000  
##  Max.   :27.00               Max.   :5.000000                 Max.   :29.000  
##  riot_fatalities    total_violent_events total_fatalities 
##  Min.   :0.000000   Min.   : 0.00        Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.: 1.00        1st Qu.:0.00000  
##  Median :0.000000   Median : 4.00        Median :0.00000  
##  Mean   :0.004791   Mean   : 6.05        Mean   :0.01437  
##  3rd Qu.:0.000000   3rd Qu.: 8.00        3rd Qu.:0.00000  
##  Max.   :5.000000   Max.   :58.00        Max.   :5.00000
 # Determine dimensions
dim(events)
## [1] 1461   34
# Determine which column/predictor numbers we want to analyze
colnames(events)
##  [1] "Armed_clash_count"                "Armed_clash_fatalities"          
##  [3] "battle_count"                     "battle_fatalities"               
##  [5] "SAMA_count"                       "SAMA_fatalities"                 
##  [7] "RELIED_count"                     "RELIED_fatalities"               
##  [9] "Grenade_count"                    "Grenade_fatalities"              
## [11] "explosion_count"                  "explosions_fatalities"           
## [13] "Sexual_violence_count"            "sexual_violence_fatalities"      
## [15] "Attack_count"                     "Attack_fatalities"               
## [17] "VAC_count"                        "VAC_fatalities"                  
## [19] "EFAP_count"                       "EFAP_fatalities"                 
## [21] "PWI_count"                        "PWI_fatalities"                  
## [23] "Peaceful_protest_count"           "Peaceful_protest_fatalities"     
## [25] "protest_count"                    "protest_fatalities"              
## [27] "mob_violence_count"               "mob_violence_fatalities"         
## [29] "violent_demonstration_count"      "violent_demonstration_fatalities"
## [31] "riot_count"                       "riot_fatalities"                 
## [33] "total_violent_events"             "total_fatalities"
# We select five events to analyze
new_events <- events[, c(23,25,29,31,33)]

# Applying princomp() function to the dataset.
pca1 <- princomp(scale(new_events))

# Determine the 'loadings' i.e. weight of each variable
pca1$loadings
## 
## Loadings:
##                             Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Peaceful_protest_count       0.453  0.436  0.105  0.770       
## protest_count                0.455  0.432        -0.512 -0.579
## violent_demonstration_count  0.398 -0.576  0.714              
## riot_count                   0.418 -0.528 -0.657  0.150 -0.303
## total_violent_events         0.504  0.110 -0.197 -0.349  0.757
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## SS loadings       1.0    1.0    1.0    1.0    1.0
## Proportion Var    0.2    0.2    0.2    0.2    0.2
## Cumulative Var    0.2    0.4    0.6    0.8    1.0
## Histogram Scree chart 
pca.var <- pca1$sdev^2
pca.var.per <- round(pca.var/sum(pca.var)*100, 1)

pc <- prcomp(new_events, scale=TRUE)

# First four principal components
comp <- data.frame(pc$x[,1:4])

# Create a new predictor variable using the ACLED values (peaceful protests, etc) * their respective weights computed from PCA above
master_df$PCA_Value <- round(master_df$Peaceful_protest_count*pca1$loadings[1] + master_df$protest_count*pca1$loadings[2] + master_df$violent_demonstration_count*pca1$loadings[3] + master_df$riot_count*pca1$loadings[4] + as.numeric(master_df$riot_count)*pca1$loadings[5])
# Determine which coefficients produce the most accurate model
regfit.backward.full <- regsubsets(PCA_Value ~ AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low, data = master_df, nvmax = 500, really.big =  TRUE, method = 'backward')

# Produce a summary of the full model
res.backward.sum <- summary(regfit.backward.full)

# Determine which model has the highest R-squared
backward.adjr2 <- which.max(res.backward.sum$adjr2)

# Determine which model has the lowest Mallow's CP
backward.cp <- which.min(res.backward.sum$cp)

# Determine which model has the lowest BIC
backward.bic <- which.min(res.backward.sum$bic)
## [1] "The backwards model with the highest R-squared is 83 with a value of 0.6894"
## [1] "The backwards model with the lowest Mallow's CP is 43 with a value of -6.3124"
## [1] "The backwards model with the lowest BIC is 20 with a value of -1479.9299"
## Backwards model

# Determines which features we should select according to our three measures of effectiveness: R-squared, Mallow's CP, and BIC
backward.adjr2.formula <- get_model_formula(backward.adjr2, regfit.backward.full, "PCA_Value")
backward.cp.formula <- get_model_formula(backward.cp, regfit.backward.full, "PCA_Value")
backward.bic.formula <- get_model_formula(backward.bic, regfit.backward.full, "PCA_Value")
# Determine which coefficients produce the most accurate model
regfit.forward.full <- regsubsets(PCA_Value ~ AJArabic_Total_Posts + AJArabic_Favorites + AJArabic_Retweets + al_wafa_Total_Posts + al_wafa_Favorites + al_wafa_Retweets + Alwatan_Live_Total_Posts  + Alwatan_Live_Favorites +  Alwatan_Live_Retweets + bahrain_moi_Total_Posts + bahrain_moi_Favorites + bahrain_moi_Retweets + BahrainRights_Total_Posts + BahrainRights_Favorites + BahrainRights_Retweets + BBCArabic_Total_Posts  + BBCArabic_Favorites + BBCArabic_Retweets + bh14feb2011_Total_Posts + bh14feb2011_Favorites + bh14feb2011_Retweets + bna_ar_Total_Posts + bna_ar_Favorites + bna_ar_Retweets + Coalition14_Total_Posts + Coalition14_Favorites + Coalition14_Retweets + duraz_youth_Total_Posts + duraz_youth_Favorites + duraz_youth_Retweets + feb14revolution_Total_Posts + feb14revolution_Favorites +    feb14revolution_Retweets + GDNonline_Total_Posts + GDNonline_Favorites + GDNonline_Retweets + Iran_Total_Posts + Iran_Favorites + Iran_Retweets + IranNW_Total_Posts + IranNW_Favorites + IranNW_Retweets  + khalidalkhalifa_Total_Posts + khalidalkhalifa_Favorites + khalidalkhalifa_Retweets + khamenei_Total_Posts  + khamenei_Favorites + khamenei_Retweets + KUhp2222_Total_Posts + KUhp2222_Favorites +  KUhp2222_Retweets + malarab1_Total_Posts + malarab1_Favorites + malarab1_Retweets + NABEELRAJAB_Total_Posts + NABEELRAJAB_Favorites + NABEELRAJAB_Retweets + netanyahu_Total_Posts + netanyahu_Favorites + netanyahu_Retweets + NSA_Bahrain_Total_Posts + NSA_Bahrain_Favorites + NSA_Bahrain_Retweets + rouhani_Total_Posts + rouhani_Favorites + rouhani_Retweets + trump_Total_Posts + trump_Favorites + trump_Retweets + USEmbassyManama_Total_Posts + USEmbassyManama_Favorites + USEmbassyManama_Retweets + TEMP + DEWP + WDSP + MXSPD + PRCP + zinc_Close + zinc_Open + zinc_High + zinc_Low + WTI_Close + WTI_Open + WTI_High + WTI_Low + wheat_Close + wheat_Open + wheat_High + wheat_Low + tin_Close + tin_Open + tin_High + tin_Low + sugar_Close + sugar_Open + sugar_High + sugar_Low + soybean_Close + soybean_Open + soybean_High + soybean_Low + silver_Close + silver_Open + silver_High + silver_Low + rice_Close + rice_Open + rice_High + rice_Low + platinum_Close + platinum_Open + platinum_High + platinum_Low + nickel_Close + nickel_Open + nickel_High + nickel_Low + natural_gas_Close + natural_gas_Open + natural_gas_High + natural_gas_Low + monero_Close + monero_Open + monero_High + monero_Low + litecoin_Close + litecoin_Open + litecoin_High + litecoin_Low + lead_Close + lead_Open + lead_High + lead_Low + Iron_USA_Close + Iron_USA_Open + Iron_USA_High + Iron_USA_Low + Gold_Close + Gold_Open + Gold_High + Gold_Low +GBP_BHD_Close + GBP_BHD_Open+ GBP_BHD_High + GBP_BHD_Low + cotton_Close + cotton_Open + cotton_High + cotton_Low +  corn_Close +  corn_Open + corn_High + corn_Low + copper_Close + copper_Open + copper_High + copper_Low + coffee_Close + coffee_Open + coffee_High + coffee_Low + live_cattle_Close + live_cattle_Open + live_cattle_High + live_cattle_Low + feed_cattle_Close + feed_cattle_Open + feed_cattle_High + feed_cattle_Low + Brent_Close + Brent_Open + Brent_High + Brent_Low + Bitcoin_Close + Bitcoin_Open + Bitcoin_High + Bitcoin_Low + BHD_EUR_Close + BHD_EUR_Open + BHD_EUR_High + BHD_EUR_Low + BAX_Close + BAX_Open + BAX_High + BAX_Low, data = master_df, nvmax = 500, really.big =  TRUE, method = 'forward')

# Produce a summary of the full model
res.forward.sum <- summary(regfit.forward.full)

# Determine which model has the highest R-squared
forward.adjr2 <- which.max(res.forward.sum$adjr2)

# Determine which model has the lowest Mallow's CP
forward.cp <- which.min(res.forward.sum$cp)

# Determine which model has the lowest BIC
forward.bic <- which.min(res.forward.sum$bic)
## [1] "The forwards model with the highest R-squared is 90 with a value of 0.6856"
## [1] "The forwards model with the lowest Mallow's CP is 41 with a value of 0.9132"
## [1] "The forwards model with the lowest BIC is 20 with a value of -1475.5762"
## Forwards model

# Determines which features we should select according to our three measures of effectiveness: R-squared, Mallow's CP, and BIC
forward.adjr2.formula <- get_model_formula(forward.adjr2, regfit.forward.full, "PCA_Value")
forward.cp.formula <- get_model_formula(forward.cp, regfit.forward.full, "PCA_Value")
forward.bic.formula <- get_model_formula(forward.bic, regfit.forward.full, "PCA_Value")
# Subset the dataframe
riots_new <- master_df[, c(backward.adjr2.formula, "PCA_Value")]
# First: 1:(t-1)

i <- 1 # Initialize a counter
while(i <= length(backward.adjr2.formula)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(backward.adjr2.formula[i],"1 <- riots_new[1:(t-1),", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts1 <- riots_new[1:(t-1),1] 
## bahrain_moi_Total_Posts1 <- riots_new[1:(t-1),2] 
## BahrainRights_Favorites1 <- riots_new[1:(t-1),3] 
## BahrainRights_Retweets1 <- riots_new[1:(t-1),4] 
## BBCArabic_Total_Posts1 <- riots_new[1:(t-1),5] 
## BBCArabic_Favorites1 <- riots_new[1:(t-1),6] 
## BBCArabic_Retweets1 <- riots_new[1:(t-1),7] 
## bh14feb2011_Total_Posts1 <- riots_new[1:(t-1),8] 
## bh14feb2011_Favorites1 <- riots_new[1:(t-1),9] 
## bh14feb2011_Retweets1 <- riots_new[1:(t-1),10] 
## bna_ar_Total_Posts1 <- riots_new[1:(t-1),11] 
## bna_ar_Retweets1 <- riots_new[1:(t-1),12] 
## Coalition14_Total_Posts1 <- riots_new[1:(t-1),13] 
## Coalition14_Favorites1 <- riots_new[1:(t-1),14] 
## duraz_youth_Total_Posts1 <- riots_new[1:(t-1),15] 
## duraz_youth_Favorites1 <- riots_new[1:(t-1),16] 
## duraz_youth_Retweets1 <- riots_new[1:(t-1),17] 
## feb14revolution_Retweets1 <- riots_new[1:(t-1),18] 
## GDNonline_Total_Posts1 <- riots_new[1:(t-1),19] 
## GDNonline_Favorites1 <- riots_new[1:(t-1),20] 
## GDNonline_Retweets1 <- riots_new[1:(t-1),21] 
## Iran_Total_Posts1 <- riots_new[1:(t-1),22] 
## Iran_Favorites1 <- riots_new[1:(t-1),23] 
## IranNW_Retweets1 <- riots_new[1:(t-1),24] 
## malarab1_Favorites1 <- riots_new[1:(t-1),25] 
## NABEELRAJAB_Favorites1 <- riots_new[1:(t-1),26] 
## netanyahu_Favorites1 <- riots_new[1:(t-1),27] 
## netanyahu_Retweets1 <- riots_new[1:(t-1),28] 
## rouhani_Total_Posts1 <- riots_new[1:(t-1),29] 
## rouhani_Favorites1 <- riots_new[1:(t-1),30] 
## USEmbassyManama_Retweets1 <- riots_new[1:(t-1),31] 
## TEMP1 <- riots_new[1:(t-1),32] 
## DEWP1 <- riots_new[1:(t-1),33] 
## WDSP1 <- riots_new[1:(t-1),34] 
## PRCP1 <- riots_new[1:(t-1),35] 
## zinc_Open1 <- riots_new[1:(t-1),36] 
## zinc_Low1 <- riots_new[1:(t-1),37] 
## WTI_Close1 <- riots_new[1:(t-1),38] 
## WTI_Low1 <- riots_new[1:(t-1),39] 
## wheat_Open1 <- riots_new[1:(t-1),40] 
## wheat_High1 <- riots_new[1:(t-1),41] 
## wheat_Low1 <- riots_new[1:(t-1),42] 
## tin_Close1 <- riots_new[1:(t-1),43] 
## tin_High1 <- riots_new[1:(t-1),44] 
## tin_Low1 <- riots_new[1:(t-1),45] 
## sugar_Close1 <- riots_new[1:(t-1),46] 
## sugar_Open1 <- riots_new[1:(t-1),47] 
## sugar_Low1 <- riots_new[1:(t-1),48] 
## soybean_Close1 <- riots_new[1:(t-1),49] 
## soybean_Open1 <- riots_new[1:(t-1),50] 
## soybean_High1 <- riots_new[1:(t-1),51] 
## soybean_Low1 <- riots_new[1:(t-1),52] 
## silver_High1 <- riots_new[1:(t-1),53] 
## rice_Close1 <- riots_new[1:(t-1),54] 
## rice_High1 <- riots_new[1:(t-1),55] 
## platinum_Close1 <- riots_new[1:(t-1),56] 
## natural_gas_Close1 <- riots_new[1:(t-1),57] 
## monero_Close1 <- riots_new[1:(t-1),58] 
## monero_High1 <- riots_new[1:(t-1),59] 
## litecoin_Close1 <- riots_new[1:(t-1),60] 
## litecoin_Open1 <- riots_new[1:(t-1),61] 
## litecoin_Low1 <- riots_new[1:(t-1),62] 
## lead_High1 <- riots_new[1:(t-1),63] 
## lead_Low1 <- riots_new[1:(t-1),64] 
## Gold_Low1 <- riots_new[1:(t-1),65] 
## cotton_Low1 <- riots_new[1:(t-1),66] 
## corn_High1 <- riots_new[1:(t-1),67] 
## corn_Low1 <- riots_new[1:(t-1),68] 
## copper_Close1 <- riots_new[1:(t-1),69] 
## copper_High1 <- riots_new[1:(t-1),70] 
## coffee_Close1 <- riots_new[1:(t-1),71] 
## coffee_High1 <- riots_new[1:(t-1),72] 
## live_cattle_Close1 <- riots_new[1:(t-1),73] 
## live_cattle_High1 <- riots_new[1:(t-1),74] 
## live_cattle_Low1 <- riots_new[1:(t-1),75] 
## feed_cattle_Close1 <- riots_new[1:(t-1),76] 
## feed_cattle_Open1 <- riots_new[1:(t-1),77] 
## Brent_Close1 <- riots_new[1:(t-1),78] 
## Brent_Open1 <- riots_new[1:(t-1),79] 
## Bitcoin_Open1 <- riots_new[1:(t-1),80] 
## Bitcoin_Low1 <- riots_new[1:(t-1),81] 
## BAX_Open1 <- riots_new[1:(t-1),82] 
## BAX_High1 <- riots_new[1:(t-1),83]
# Second: 2:t

i <- 1 # Initialize a counter
while(i <= length(backward.adjr2.formula)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(backward.adjr2.formula[i],"2 <- riots_new[2:t,", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts2 <- riots_new[2:t,1] 
## bahrain_moi_Total_Posts2 <- riots_new[2:t,2] 
## BahrainRights_Favorites2 <- riots_new[2:t,3] 
## BahrainRights_Retweets2 <- riots_new[2:t,4] 
## BBCArabic_Total_Posts2 <- riots_new[2:t,5] 
## BBCArabic_Favorites2 <- riots_new[2:t,6] 
## BBCArabic_Retweets2 <- riots_new[2:t,7] 
## bh14feb2011_Total_Posts2 <- riots_new[2:t,8] 
## bh14feb2011_Favorites2 <- riots_new[2:t,9] 
## bh14feb2011_Retweets2 <- riots_new[2:t,10] 
## bna_ar_Total_Posts2 <- riots_new[2:t,11] 
## bna_ar_Retweets2 <- riots_new[2:t,12] 
## Coalition14_Total_Posts2 <- riots_new[2:t,13] 
## Coalition14_Favorites2 <- riots_new[2:t,14] 
## duraz_youth_Total_Posts2 <- riots_new[2:t,15] 
## duraz_youth_Favorites2 <- riots_new[2:t,16] 
## duraz_youth_Retweets2 <- riots_new[2:t,17] 
## feb14revolution_Retweets2 <- riots_new[2:t,18] 
## GDNonline_Total_Posts2 <- riots_new[2:t,19] 
## GDNonline_Favorites2 <- riots_new[2:t,20] 
## GDNonline_Retweets2 <- riots_new[2:t,21] 
## Iran_Total_Posts2 <- riots_new[2:t,22] 
## Iran_Favorites2 <- riots_new[2:t,23] 
## IranNW_Retweets2 <- riots_new[2:t,24] 
## malarab1_Favorites2 <- riots_new[2:t,25] 
## NABEELRAJAB_Favorites2 <- riots_new[2:t,26] 
## netanyahu_Favorites2 <- riots_new[2:t,27] 
## netanyahu_Retweets2 <- riots_new[2:t,28] 
## rouhani_Total_Posts2 <- riots_new[2:t,29] 
## rouhani_Favorites2 <- riots_new[2:t,30] 
## USEmbassyManama_Retweets2 <- riots_new[2:t,31] 
## TEMP2 <- riots_new[2:t,32] 
## DEWP2 <- riots_new[2:t,33] 
## WDSP2 <- riots_new[2:t,34] 
## PRCP2 <- riots_new[2:t,35] 
## zinc_Open2 <- riots_new[2:t,36] 
## zinc_Low2 <- riots_new[2:t,37] 
## WTI_Close2 <- riots_new[2:t,38] 
## WTI_Low2 <- riots_new[2:t,39] 
## wheat_Open2 <- riots_new[2:t,40] 
## wheat_High2 <- riots_new[2:t,41] 
## wheat_Low2 <- riots_new[2:t,42] 
## tin_Close2 <- riots_new[2:t,43] 
## tin_High2 <- riots_new[2:t,44] 
## tin_Low2 <- riots_new[2:t,45] 
## sugar_Close2 <- riots_new[2:t,46] 
## sugar_Open2 <- riots_new[2:t,47] 
## sugar_Low2 <- riots_new[2:t,48] 
## soybean_Close2 <- riots_new[2:t,49] 
## soybean_Open2 <- riots_new[2:t,50] 
## soybean_High2 <- riots_new[2:t,51] 
## soybean_Low2 <- riots_new[2:t,52] 
## silver_High2 <- riots_new[2:t,53] 
## rice_Close2 <- riots_new[2:t,54] 
## rice_High2 <- riots_new[2:t,55] 
## platinum_Close2 <- riots_new[2:t,56] 
## natural_gas_Close2 <- riots_new[2:t,57] 
## monero_Close2 <- riots_new[2:t,58] 
## monero_High2 <- riots_new[2:t,59] 
## litecoin_Close2 <- riots_new[2:t,60] 
## litecoin_Open2 <- riots_new[2:t,61] 
## litecoin_Low2 <- riots_new[2:t,62] 
## lead_High2 <- riots_new[2:t,63] 
## lead_Low2 <- riots_new[2:t,64] 
## Gold_Low2 <- riots_new[2:t,65] 
## cotton_Low2 <- riots_new[2:t,66] 
## corn_High2 <- riots_new[2:t,67] 
## corn_Low2 <- riots_new[2:t,68] 
## copper_Close2 <- riots_new[2:t,69] 
## copper_High2 <- riots_new[2:t,70] 
## coffee_Close2 <- riots_new[2:t,71] 
## coffee_High2 <- riots_new[2:t,72] 
## live_cattle_Close2 <- riots_new[2:t,73] 
## live_cattle_High2 <- riots_new[2:t,74] 
## live_cattle_Low2 <- riots_new[2:t,75] 
## feed_cattle_Close2 <- riots_new[2:t,76] 
## feed_cattle_Open2 <- riots_new[2:t,77] 
## Brent_Close2 <- riots_new[2:t,78] 
## Brent_Open2 <- riots_new[2:t,79] 
## Bitcoin_Open2 <- riots_new[2:t,80] 
## Bitcoin_Low2 <- riots_new[2:t,81] 
## BAX_Open2 <- riots_new[2:t,82] 
## BAX_High2 <- riots_new[2:t,83]
# Third: First part of training dataframe

i <- 1 # Initialize a counter
while(i <= length(backward.adjr2.formula)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(backward.adjr2.formula[i],"1, "))
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1,
# Fourth: Second part of training dataframe

i <- 1 # Initialize a counter
while(i <= length(backward.adjr2.formula)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(backward.adjr2.formula[i],"2, "))
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2,
# Fifth: t:(t+p-1)

i <- 1 # Initialize a counter
while(i <= length(backward.adjr2.formula)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(backward.adjr2.formula[i],"1 <- riots_new[t:(t+p-1),", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts1 <- riots_new[t:(t+p-1),1] 
## bahrain_moi_Total_Posts1 <- riots_new[t:(t+p-1),2] 
## BahrainRights_Favorites1 <- riots_new[t:(t+p-1),3] 
## BahrainRights_Retweets1 <- riots_new[t:(t+p-1),4] 
## BBCArabic_Total_Posts1 <- riots_new[t:(t+p-1),5] 
## BBCArabic_Favorites1 <- riots_new[t:(t+p-1),6] 
## BBCArabic_Retweets1 <- riots_new[t:(t+p-1),7] 
## bh14feb2011_Total_Posts1 <- riots_new[t:(t+p-1),8] 
## bh14feb2011_Favorites1 <- riots_new[t:(t+p-1),9] 
## bh14feb2011_Retweets1 <- riots_new[t:(t+p-1),10] 
## bna_ar_Total_Posts1 <- riots_new[t:(t+p-1),11] 
## bna_ar_Retweets1 <- riots_new[t:(t+p-1),12] 
## Coalition14_Total_Posts1 <- riots_new[t:(t+p-1),13] 
## Coalition14_Favorites1 <- riots_new[t:(t+p-1),14] 
## duraz_youth_Total_Posts1 <- riots_new[t:(t+p-1),15] 
## duraz_youth_Favorites1 <- riots_new[t:(t+p-1),16] 
## duraz_youth_Retweets1 <- riots_new[t:(t+p-1),17] 
## feb14revolution_Retweets1 <- riots_new[t:(t+p-1),18] 
## GDNonline_Total_Posts1 <- riots_new[t:(t+p-1),19] 
## GDNonline_Favorites1 <- riots_new[t:(t+p-1),20] 
## GDNonline_Retweets1 <- riots_new[t:(t+p-1),21] 
## Iran_Total_Posts1 <- riots_new[t:(t+p-1),22] 
## Iran_Favorites1 <- riots_new[t:(t+p-1),23] 
## IranNW_Retweets1 <- riots_new[t:(t+p-1),24] 
## malarab1_Favorites1 <- riots_new[t:(t+p-1),25] 
## NABEELRAJAB_Favorites1 <- riots_new[t:(t+p-1),26] 
## netanyahu_Favorites1 <- riots_new[t:(t+p-1),27] 
## netanyahu_Retweets1 <- riots_new[t:(t+p-1),28] 
## rouhani_Total_Posts1 <- riots_new[t:(t+p-1),29] 
## rouhani_Favorites1 <- riots_new[t:(t+p-1),30] 
## USEmbassyManama_Retweets1 <- riots_new[t:(t+p-1),31] 
## TEMP1 <- riots_new[t:(t+p-1),32] 
## DEWP1 <- riots_new[t:(t+p-1),33] 
## WDSP1 <- riots_new[t:(t+p-1),34] 
## PRCP1 <- riots_new[t:(t+p-1),35] 
## zinc_Open1 <- riots_new[t:(t+p-1),36] 
## zinc_Low1 <- riots_new[t:(t+p-1),37] 
## WTI_Close1 <- riots_new[t:(t+p-1),38] 
## WTI_Low1 <- riots_new[t:(t+p-1),39] 
## wheat_Open1 <- riots_new[t:(t+p-1),40] 
## wheat_High1 <- riots_new[t:(t+p-1),41] 
## wheat_Low1 <- riots_new[t:(t+p-1),42] 
## tin_Close1 <- riots_new[t:(t+p-1),43] 
## tin_High1 <- riots_new[t:(t+p-1),44] 
## tin_Low1 <- riots_new[t:(t+p-1),45] 
## sugar_Close1 <- riots_new[t:(t+p-1),46] 
## sugar_Open1 <- riots_new[t:(t+p-1),47] 
## sugar_Low1 <- riots_new[t:(t+p-1),48] 
## soybean_Close1 <- riots_new[t:(t+p-1),49] 
## soybean_Open1 <- riots_new[t:(t+p-1),50] 
## soybean_High1 <- riots_new[t:(t+p-1),51] 
## soybean_Low1 <- riots_new[t:(t+p-1),52] 
## silver_High1 <- riots_new[t:(t+p-1),53] 
## rice_Close1 <- riots_new[t:(t+p-1),54] 
## rice_High1 <- riots_new[t:(t+p-1),55] 
## platinum_Close1 <- riots_new[t:(t+p-1),56] 
## natural_gas_Close1 <- riots_new[t:(t+p-1),57] 
## monero_Close1 <- riots_new[t:(t+p-1),58] 
## monero_High1 <- riots_new[t:(t+p-1),59] 
## litecoin_Close1 <- riots_new[t:(t+p-1),60] 
## litecoin_Open1 <- riots_new[t:(t+p-1),61] 
## litecoin_Low1 <- riots_new[t:(t+p-1),62] 
## lead_High1 <- riots_new[t:(t+p-1),63] 
## lead_Low1 <- riots_new[t:(t+p-1),64] 
## Gold_Low1 <- riots_new[t:(t+p-1),65] 
## cotton_Low1 <- riots_new[t:(t+p-1),66] 
## corn_High1 <- riots_new[t:(t+p-1),67] 
## corn_Low1 <- riots_new[t:(t+p-1),68] 
## copper_Close1 <- riots_new[t:(t+p-1),69] 
## copper_High1 <- riots_new[t:(t+p-1),70] 
## coffee_Close1 <- riots_new[t:(t+p-1),71] 
## coffee_High1 <- riots_new[t:(t+p-1),72] 
## live_cattle_Close1 <- riots_new[t:(t+p-1),73] 
## live_cattle_High1 <- riots_new[t:(t+p-1),74] 
## live_cattle_Low1 <- riots_new[t:(t+p-1),75] 
## feed_cattle_Close1 <- riots_new[t:(t+p-1),76] 
## feed_cattle_Open1 <- riots_new[t:(t+p-1),77] 
## Brent_Close1 <- riots_new[t:(t+p-1),78] 
## Brent_Open1 <- riots_new[t:(t+p-1),79] 
## Bitcoin_Open1 <- riots_new[t:(t+p-1),80] 
## Bitcoin_Low1 <- riots_new[t:(t+p-1),81] 
## BAX_Open1 <- riots_new[t:(t+p-1),82] 
## BAX_High1 <- riots_new[t:(t+p-1),83]
# Sixth: (t+1):(t+p)

i <- 1 # Initialize a counter
while(i <= length(backward.adjr2.formula)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(backward.adjr2.formula[i],"2 <- riots_new[(t+1):(t+p),", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts2 <- riots_new[(t+1):(t+p),1] 
## bahrain_moi_Total_Posts2 <- riots_new[(t+1):(t+p),2] 
## BahrainRights_Favorites2 <- riots_new[(t+1):(t+p),3] 
## BahrainRights_Retweets2 <- riots_new[(t+1):(t+p),4] 
## BBCArabic_Total_Posts2 <- riots_new[(t+1):(t+p),5] 
## BBCArabic_Favorites2 <- riots_new[(t+1):(t+p),6] 
## BBCArabic_Retweets2 <- riots_new[(t+1):(t+p),7] 
## bh14feb2011_Total_Posts2 <- riots_new[(t+1):(t+p),8] 
## bh14feb2011_Favorites2 <- riots_new[(t+1):(t+p),9] 
## bh14feb2011_Retweets2 <- riots_new[(t+1):(t+p),10] 
## bna_ar_Total_Posts2 <- riots_new[(t+1):(t+p),11] 
## bna_ar_Retweets2 <- riots_new[(t+1):(t+p),12] 
## Coalition14_Total_Posts2 <- riots_new[(t+1):(t+p),13] 
## Coalition14_Favorites2 <- riots_new[(t+1):(t+p),14] 
## duraz_youth_Total_Posts2 <- riots_new[(t+1):(t+p),15] 
## duraz_youth_Favorites2 <- riots_new[(t+1):(t+p),16] 
## duraz_youth_Retweets2 <- riots_new[(t+1):(t+p),17] 
## feb14revolution_Retweets2 <- riots_new[(t+1):(t+p),18] 
## GDNonline_Total_Posts2 <- riots_new[(t+1):(t+p),19] 
## GDNonline_Favorites2 <- riots_new[(t+1):(t+p),20] 
## GDNonline_Retweets2 <- riots_new[(t+1):(t+p),21] 
## Iran_Total_Posts2 <- riots_new[(t+1):(t+p),22] 
## Iran_Favorites2 <- riots_new[(t+1):(t+p),23] 
## IranNW_Retweets2 <- riots_new[(t+1):(t+p),24] 
## malarab1_Favorites2 <- riots_new[(t+1):(t+p),25] 
## NABEELRAJAB_Favorites2 <- riots_new[(t+1):(t+p),26] 
## netanyahu_Favorites2 <- riots_new[(t+1):(t+p),27] 
## netanyahu_Retweets2 <- riots_new[(t+1):(t+p),28] 
## rouhani_Total_Posts2 <- riots_new[(t+1):(t+p),29] 
## rouhani_Favorites2 <- riots_new[(t+1):(t+p),30] 
## USEmbassyManama_Retweets2 <- riots_new[(t+1):(t+p),31] 
## TEMP2 <- riots_new[(t+1):(t+p),32] 
## DEWP2 <- riots_new[(t+1):(t+p),33] 
## WDSP2 <- riots_new[(t+1):(t+p),34] 
## PRCP2 <- riots_new[(t+1):(t+p),35] 
## zinc_Open2 <- riots_new[(t+1):(t+p),36] 
## zinc_Low2 <- riots_new[(t+1):(t+p),37] 
## WTI_Close2 <- riots_new[(t+1):(t+p),38] 
## WTI_Low2 <- riots_new[(t+1):(t+p),39] 
## wheat_Open2 <- riots_new[(t+1):(t+p),40] 
## wheat_High2 <- riots_new[(t+1):(t+p),41] 
## wheat_Low2 <- riots_new[(t+1):(t+p),42] 
## tin_Close2 <- riots_new[(t+1):(t+p),43] 
## tin_High2 <- riots_new[(t+1):(t+p),44] 
## tin_Low2 <- riots_new[(t+1):(t+p),45] 
## sugar_Close2 <- riots_new[(t+1):(t+p),46] 
## sugar_Open2 <- riots_new[(t+1):(t+p),47] 
## sugar_Low2 <- riots_new[(t+1):(t+p),48] 
## soybean_Close2 <- riots_new[(t+1):(t+p),49] 
## soybean_Open2 <- riots_new[(t+1):(t+p),50] 
## soybean_High2 <- riots_new[(t+1):(t+p),51] 
## soybean_Low2 <- riots_new[(t+1):(t+p),52] 
## silver_High2 <- riots_new[(t+1):(t+p),53] 
## rice_Close2 <- riots_new[(t+1):(t+p),54] 
## rice_High2 <- riots_new[(t+1):(t+p),55] 
## platinum_Close2 <- riots_new[(t+1):(t+p),56] 
## natural_gas_Close2 <- riots_new[(t+1):(t+p),57] 
## monero_Close2 <- riots_new[(t+1):(t+p),58] 
## monero_High2 <- riots_new[(t+1):(t+p),59] 
## litecoin_Close2 <- riots_new[(t+1):(t+p),60] 
## litecoin_Open2 <- riots_new[(t+1):(t+p),61] 
## litecoin_Low2 <- riots_new[(t+1):(t+p),62] 
## lead_High2 <- riots_new[(t+1):(t+p),63] 
## lead_Low2 <- riots_new[(t+1):(t+p),64] 
## Gold_Low2 <- riots_new[(t+1):(t+p),65] 
## cotton_Low2 <- riots_new[(t+1):(t+p),66] 
## corn_High2 <- riots_new[(t+1):(t+p),67] 
## corn_Low2 <- riots_new[(t+1):(t+p),68] 
## copper_Close2 <- riots_new[(t+1):(t+p),69] 
## copper_High2 <- riots_new[(t+1):(t+p),70] 
## coffee_Close2 <- riots_new[(t+1):(t+p),71] 
## coffee_High2 <- riots_new[(t+1):(t+p),72] 
## live_cattle_Close2 <- riots_new[(t+1):(t+p),73] 
## live_cattle_High2 <- riots_new[(t+1):(t+p),74] 
## live_cattle_Low2 <- riots_new[(t+1):(t+p),75] 
## feed_cattle_Close2 <- riots_new[(t+1):(t+p),76] 
## feed_cattle_Open2 <- riots_new[(t+1):(t+p),77] 
## Brent_Close2 <- riots_new[(t+1):(t+p),78] 
## Brent_Open2 <- riots_new[(t+1):(t+p),79] 
## Bitcoin_Open2 <- riots_new[(t+1):(t+p),80] 
## Bitcoin_Low2 <- riots_new[(t+1):(t+p),81] 
## BAX_Open2 <- riots_new[(t+1):(t+p),82] 
## BAX_High2 <- riots_new[(t+1):(t+p),83]
# Delete test and train dataframes as they were used previously
test <- NULL
train <- NULL

### Define s
s <- 1

### Initial train set sample size
i.train.sample <- 1435

### One time period
p <- 7

### Define t
t <- i.train.sample + s

### Defining a train set
# Define predictors
## For time t - 1
al_wafa_Total_Posts1 <- riots_new[1:(t-1),1] 
bahrain_moi_Total_Posts1 <- riots_new[1:(t-1),2] 
BahrainRights_Favorites1 <- riots_new[1:(t-1),3] 
BahrainRights_Retweets1 <- riots_new[1:(t-1),4] 
BBCArabic_Total_Posts1 <- riots_new[1:(t-1),5] 
BBCArabic_Favorites1 <- riots_new[1:(t-1),6] 
BBCArabic_Retweets1 <- riots_new[1:(t-1),7] 
bh14feb2011_Total_Posts1 <- riots_new[1:(t-1),8] 
bh14feb2011_Favorites1 <- riots_new[1:(t-1),9] 
bh14feb2011_Retweets1 <- riots_new[1:(t-1),10] 
bna_ar_Total_Posts1 <- riots_new[1:(t-1),11] 
bna_ar_Retweets1 <- riots_new[1:(t-1),12] 
Coalition14_Total_Posts1 <- riots_new[1:(t-1),13] 
Coalition14_Favorites1 <- riots_new[1:(t-1),14] 
duraz_youth_Total_Posts1 <- riots_new[1:(t-1),15] 
duraz_youth_Favorites1 <- riots_new[1:(t-1),16] 
duraz_youth_Retweets1 <- riots_new[1:(t-1),17] 
feb14revolution_Retweets1 <- riots_new[1:(t-1),18] 
GDNonline_Total_Posts1 <- riots_new[1:(t-1),19] 
GDNonline_Favorites1 <- riots_new[1:(t-1),20] 
GDNonline_Retweets1 <- riots_new[1:(t-1),21] 
Iran_Total_Posts1 <- riots_new[1:(t-1),22] 
Iran_Favorites1 <- riots_new[1:(t-1),23] 
IranNW_Retweets1 <- riots_new[1:(t-1),24] 
malarab1_Favorites1 <- riots_new[1:(t-1),25] 
NABEELRAJAB_Favorites1 <- riots_new[1:(t-1),26] 
netanyahu_Favorites1 <- riots_new[1:(t-1),27] 
netanyahu_Retweets1 <- riots_new[1:(t-1),28] 
rouhani_Total_Posts1 <- riots_new[1:(t-1),29] 
rouhani_Favorites1 <- riots_new[1:(t-1),30] 
USEmbassyManama_Retweets1 <- riots_new[1:(t-1),31] 
TEMP1 <- riots_new[1:(t-1),32] 
DEWP1 <- riots_new[1:(t-1),33] 
WDSP1 <- riots_new[1:(t-1),34] 
PRCP1 <- riots_new[1:(t-1),35] 
zinc_Open1 <- riots_new[1:(t-1),36] 
zinc_Low1 <- riots_new[1:(t-1),37] 
WTI_Close1 <- riots_new[1:(t-1),38] 
WTI_Low1 <- riots_new[1:(t-1),39] 
wheat_Open1 <- riots_new[1:(t-1),40] 
wheat_High1 <- riots_new[1:(t-1),41] 
wheat_Low1 <- riots_new[1:(t-1),42] 
tin_Close1 <- riots_new[1:(t-1),43] 
tin_High1 <- riots_new[1:(t-1),44] 
tin_Low1 <- riots_new[1:(t-1),45] 
sugar_Close1 <- riots_new[1:(t-1),46] 
sugar_Open1 <- riots_new[1:(t-1),47] 
sugar_Low1 <- riots_new[1:(t-1),48] 
soybean_Close1 <- riots_new[1:(t-1),49] 
soybean_Open1 <- riots_new[1:(t-1),50] 
soybean_High1 <- riots_new[1:(t-1),51] 
soybean_Low1 <- riots_new[1:(t-1),52] 
silver_High1 <- riots_new[1:(t-1),53] 
rice_Close1 <- riots_new[1:(t-1),54] 
rice_High1 <- riots_new[1:(t-1),55] 
platinum_Close1 <- riots_new[1:(t-1),56] 
natural_gas_Close1 <- riots_new[1:(t-1),57] 
monero_Close1 <- riots_new[1:(t-1),58] 
monero_High1 <- riots_new[1:(t-1),59] 
litecoin_Close1 <- riots_new[1:(t-1),60] 
litecoin_Open1 <- riots_new[1:(t-1),61] 
litecoin_Low1 <- riots_new[1:(t-1),62] 
lead_High1 <- riots_new[1:(t-1),63] 
lead_Low1 <- riots_new[1:(t-1),64] 
Gold_Low1 <- riots_new[1:(t-1),65] 
cotton_Low1 <- riots_new[1:(t-1),66] 
corn_High1 <- riots_new[1:(t-1),67] 
corn_Low1 <- riots_new[1:(t-1),68] 
copper_Close1 <- riots_new[1:(t-1),69] 
copper_High1 <- riots_new[1:(t-1),70] 
coffee_Close1 <- riots_new[1:(t-1),71] 
coffee_High1 <- riots_new[1:(t-1),72] 
live_cattle_Close1 <- riots_new[1:(t-1),73] 
live_cattle_High1 <- riots_new[1:(t-1),74] 
live_cattle_Low1 <- riots_new[1:(t-1),75] 
feed_cattle_Close1 <- riots_new[1:(t-1),76] 
feed_cattle_Open1 <- riots_new[1:(t-1),77] 
Brent_Close1 <- riots_new[1:(t-1),78] 
Brent_Open1 <- riots_new[1:(t-1),79] 
Bitcoin_Open1 <- riots_new[1:(t-1),80] 
Bitcoin_Low1 <- riots_new[1:(t-1),81] 
BAX_Open1 <- riots_new[1:(t-1),82] 
BAX_High1 <- riots_new[1:(t-1),83] 
count1 <- riots_new[1:(t-1), 84]

## For time t
al_wafa_Total_Posts2 <- riots_new[2:t,1] 
bahrain_moi_Total_Posts2 <- riots_new[2:t,2] 
BahrainRights_Favorites2 <- riots_new[2:t,3] 
BahrainRights_Retweets2 <- riots_new[2:t,4] 
BBCArabic_Total_Posts2 <- riots_new[2:t,5] 
BBCArabic_Favorites2 <- riots_new[2:t,6] 
BBCArabic_Retweets2 <- riots_new[2:t,7] 
bh14feb2011_Total_Posts2 <- riots_new[2:t,8] 
bh14feb2011_Favorites2 <- riots_new[2:t,9] 
bh14feb2011_Retweets2 <- riots_new[2:t,10] 
bna_ar_Total_Posts2 <- riots_new[2:t,11] 
bna_ar_Retweets2 <- riots_new[2:t,12] 
Coalition14_Total_Posts2 <- riots_new[2:t,13] 
Coalition14_Favorites2 <- riots_new[2:t,14] 
duraz_youth_Total_Posts2 <- riots_new[2:t,15] 
duraz_youth_Favorites2 <- riots_new[2:t,16] 
duraz_youth_Retweets2 <- riots_new[2:t,17] 
feb14revolution_Retweets2 <- riots_new[2:t,18] 
GDNonline_Total_Posts2 <- riots_new[2:t,19] 
GDNonline_Favorites2 <- riots_new[2:t,20] 
GDNonline_Retweets2 <- riots_new[2:t,21] 
Iran_Total_Posts2 <- riots_new[2:t,22] 
Iran_Favorites2 <- riots_new[2:t,23] 
IranNW_Retweets2 <- riots_new[2:t,24] 
malarab1_Favorites2 <- riots_new[2:t,25] 
NABEELRAJAB_Favorites2 <- riots_new[2:t,26] 
netanyahu_Favorites2 <- riots_new[2:t,27] 
netanyahu_Retweets2 <- riots_new[2:t,28] 
rouhani_Total_Posts2 <- riots_new[2:t,29] 
rouhani_Favorites2 <- riots_new[2:t,30] 
USEmbassyManama_Retweets2 <- riots_new[2:t,31] 
TEMP2 <- riots_new[2:t,32] 
DEWP2 <- riots_new[2:t,33] 
WDSP2 <- riots_new[2:t,34] 
PRCP2 <- riots_new[2:t,35] 
zinc_Open2 <- riots_new[2:t,36] 
zinc_Low2 <- riots_new[2:t,37] 
WTI_Close2 <- riots_new[2:t,38] 
WTI_Low2 <- riots_new[2:t,39] 
wheat_Open2 <- riots_new[2:t,40] 
wheat_High2 <- riots_new[2:t,41] 
wheat_Low2 <- riots_new[2:t,42] 
tin_Close2 <- riots_new[2:t,43] 
tin_High2 <- riots_new[2:t,44] 
tin_Low2 <- riots_new[2:t,45] 
sugar_Close2 <- riots_new[2:t,46] 
sugar_Open2 <- riots_new[2:t,47] 
sugar_Low2 <- riots_new[2:t,48] 
soybean_Close2 <- riots_new[2:t,49] 
soybean_Open2 <- riots_new[2:t,50] 
soybean_High2 <- riots_new[2:t,51] 
soybean_Low2 <- riots_new[2:t,52] 
silver_High2 <- riots_new[2:t,53] 
rice_Close2 <- riots_new[2:t,54] 
rice_High2 <- riots_new[2:t,55] 
platinum_Close2 <- riots_new[2:t,56] 
natural_gas_Close2 <- riots_new[2:t,57] 
monero_Close2 <- riots_new[2:t,58] 
monero_High2 <- riots_new[2:t,59] 
litecoin_Close2 <- riots_new[2:t,60] 
litecoin_Open2 <- riots_new[2:t,61] 
litecoin_Low2 <- riots_new[2:t,62] 
lead_High2 <- riots_new[2:t,63] 
lead_Low2 <- riots_new[2:t,64] 
Gold_Low2 <- riots_new[2:t,65] 
cotton_Low2 <- riots_new[2:t,66] 
corn_High2 <- riots_new[2:t,67] 
corn_Low2 <- riots_new[2:t,68] 
copper_Close2 <- riots_new[2:t,69] 
copper_High2 <- riots_new[2:t,70] 
coffee_Close2 <- riots_new[2:t,71] 
coffee_High2 <- riots_new[2:t,72] 
live_cattle_Close2 <- riots_new[2:t,73] 
live_cattle_High2 <- riots_new[2:t,74] 
live_cattle_Low2 <- riots_new[2:t,75] 
feed_cattle_Close2 <- riots_new[2:t,76] 
feed_cattle_Open2 <- riots_new[2:t,77] 
Brent_Close2 <- riots_new[2:t,78] 
Brent_Open2 <- riots_new[2:t,79] 
Bitcoin_Open2 <- riots_new[2:t,80] 
Bitcoin_Low2 <- riots_new[2:t,81] 
BAX_Open2 <- riots_new[2:t,82] 
BAX_High2 <- riots_new[2:t,83] 
count2 <- riots_new[2:t, 84]

# Now define the target
count3 <- riots_new[(2+s):(t+s), 84]

train <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, count1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, count2, count3)

#summary(train)

### Defining a test set
# Define predictors
## For time t - 1
al_wafa_Total_Posts1 <- riots_new[t:(t+p-1),1] 
bahrain_moi_Total_Posts1 <- riots_new[t:(t+p-1),2] 
BahrainRights_Favorites1 <- riots_new[t:(t+p-1),3] 
BahrainRights_Retweets1 <- riots_new[t:(t+p-1),4] 
BBCArabic_Total_Posts1 <- riots_new[t:(t+p-1),5] 
BBCArabic_Favorites1 <- riots_new[t:(t+p-1),6] 
BBCArabic_Retweets1 <- riots_new[t:(t+p-1),7] 
bh14feb2011_Total_Posts1 <- riots_new[t:(t+p-1),8] 
bh14feb2011_Favorites1 <- riots_new[t:(t+p-1),9] 
bh14feb2011_Retweets1 <- riots_new[t:(t+p-1),10] 
bna_ar_Total_Posts1 <- riots_new[t:(t+p-1),11] 
bna_ar_Retweets1 <- riots_new[t:(t+p-1),12] 
Coalition14_Total_Posts1 <- riots_new[t:(t+p-1),13] 
Coalition14_Favorites1 <- riots_new[t:(t+p-1),14] 
duraz_youth_Total_Posts1 <- riots_new[t:(t+p-1),15] 
duraz_youth_Favorites1 <- riots_new[t:(t+p-1),16] 
duraz_youth_Retweets1 <- riots_new[t:(t+p-1),17] 
feb14revolution_Retweets1 <- riots_new[t:(t+p-1),18] 
GDNonline_Total_Posts1 <- riots_new[t:(t+p-1),19] 
GDNonline_Favorites1 <- riots_new[t:(t+p-1),20] 
GDNonline_Retweets1 <- riots_new[t:(t+p-1),21] 
Iran_Total_Posts1 <- riots_new[t:(t+p-1),22] 
Iran_Favorites1 <- riots_new[t:(t+p-1),23] 
IranNW_Retweets1 <- riots_new[t:(t+p-1),24] 
malarab1_Favorites1 <- riots_new[t:(t+p-1),25] 
NABEELRAJAB_Favorites1 <- riots_new[t:(t+p-1),26] 
netanyahu_Favorites1 <- riots_new[t:(t+p-1),27] 
netanyahu_Retweets1 <- riots_new[t:(t+p-1),28] 
rouhani_Total_Posts1 <- riots_new[t:(t+p-1),29] 
rouhani_Favorites1 <- riots_new[t:(t+p-1),30] 
USEmbassyManama_Retweets1 <- riots_new[t:(t+p-1),31] 
TEMP1 <- riots_new[t:(t+p-1),32] 
DEWP1 <- riots_new[t:(t+p-1),33] 
WDSP1 <- riots_new[t:(t+p-1),34] 
PRCP1 <- riots_new[t:(t+p-1),35] 
zinc_Open1 <- riots_new[t:(t+p-1),36] 
zinc_Low1 <- riots_new[t:(t+p-1),37] 
WTI_Close1 <- riots_new[t:(t+p-1),38] 
WTI_Low1 <- riots_new[t:(t+p-1),39] 
wheat_Open1 <- riots_new[t:(t+p-1),40] 
wheat_High1 <- riots_new[t:(t+p-1),41] 
wheat_Low1 <- riots_new[t:(t+p-1),42] 
tin_Close1 <- riots_new[t:(t+p-1),43] 
tin_High1 <- riots_new[t:(t+p-1),44] 
tin_Low1 <- riots_new[t:(t+p-1),45] 
sugar_Close1 <- riots_new[t:(t+p-1),46] 
sugar_Open1 <- riots_new[t:(t+p-1),47] 
sugar_Low1 <- riots_new[t:(t+p-1),48] 
soybean_Close1 <- riots_new[t:(t+p-1),49] 
soybean_Open1 <- riots_new[t:(t+p-1),50] 
soybean_High1 <- riots_new[t:(t+p-1),51] 
soybean_Low1 <- riots_new[t:(t+p-1),52] 
silver_High1 <- riots_new[t:(t+p-1),53] 
rice_Close1 <- riots_new[t:(t+p-1),54] 
rice_High1 <- riots_new[t:(t+p-1),55] 
platinum_Close1 <- riots_new[t:(t+p-1),56] 
natural_gas_Close1 <- riots_new[t:(t+p-1),57] 
monero_Close1 <- riots_new[t:(t+p-1),58] 
monero_High1 <- riots_new[t:(t+p-1),59] 
litecoin_Close1 <- riots_new[t:(t+p-1),60] 
litecoin_Open1 <- riots_new[t:(t+p-1),61] 
litecoin_Low1 <- riots_new[t:(t+p-1),62] 
lead_High1 <- riots_new[t:(t+p-1),63] 
lead_Low1 <- riots_new[t:(t+p-1),64] 
Gold_Low1 <- riots_new[t:(t+p-1),65] 
cotton_Low1 <- riots_new[t:(t+p-1),66] 
corn_High1 <- riots_new[t:(t+p-1),67] 
corn_Low1 <- riots_new[t:(t+p-1),68] 
copper_Close1 <- riots_new[t:(t+p-1),69] 
copper_High1 <- riots_new[t:(t+p-1),70] 
coffee_Close1 <- riots_new[t:(t+p-1),71] 
coffee_High1 <- riots_new[t:(t+p-1),72] 
live_cattle_Close1 <- riots_new[t:(t+p-1),73] 
live_cattle_High1 <- riots_new[t:(t+p-1),74] 
live_cattle_Low1 <- riots_new[t:(t+p-1),75] 
feed_cattle_Close1 <- riots_new[t:(t+p-1),76] 
feed_cattle_Open1 <- riots_new[t:(t+p-1),77] 
Brent_Close1 <- riots_new[t:(t+p-1),78] 
Brent_Open1 <- riots_new[t:(t+p-1),79] 
Bitcoin_Open1 <- riots_new[t:(t+p-1),80] 
Bitcoin_Low1 <- riots_new[t:(t+p-1),81] 
BAX_Open1 <- riots_new[t:(t+p-1),82] 
BAX_High1 <- riots_new[t:(t+p-1),83] 
count1 <- riots_new[t:(t+p-1), 84]

## For time t
al_wafa_Total_Posts2 <- riots_new[(t+1):(t+p),1] 
bahrain_moi_Total_Posts2 <- riots_new[(t+1):(t+p),2] 
BahrainRights_Favorites2 <- riots_new[(t+1):(t+p),3] 
BahrainRights_Retweets2 <- riots_new[(t+1):(t+p),4] 
BBCArabic_Total_Posts2 <- riots_new[(t+1):(t+p),5] 
BBCArabic_Favorites2 <- riots_new[(t+1):(t+p),6] 
BBCArabic_Retweets2 <- riots_new[(t+1):(t+p),7] 
bh14feb2011_Total_Posts2 <- riots_new[(t+1):(t+p),8] 
bh14feb2011_Favorites2 <- riots_new[(t+1):(t+p),9] 
bh14feb2011_Retweets2 <- riots_new[(t+1):(t+p),10] 
bna_ar_Total_Posts2 <- riots_new[(t+1):(t+p),11] 
bna_ar_Retweets2 <- riots_new[(t+1):(t+p),12] 
Coalition14_Total_Posts2 <- riots_new[(t+1):(t+p),13] 
Coalition14_Favorites2 <- riots_new[(t+1):(t+p),14] 
duraz_youth_Total_Posts2 <- riots_new[(t+1):(t+p),15] 
duraz_youth_Favorites2 <- riots_new[(t+1):(t+p),16] 
duraz_youth_Retweets2 <- riots_new[(t+1):(t+p),17] 
feb14revolution_Retweets2 <- riots_new[(t+1):(t+p),18] 
GDNonline_Total_Posts2 <- riots_new[(t+1):(t+p),19] 
GDNonline_Favorites2 <- riots_new[(t+1):(t+p),20] 
GDNonline_Retweets2 <- riots_new[(t+1):(t+p),21] 
Iran_Total_Posts2 <- riots_new[(t+1):(t+p),22] 
Iran_Favorites2 <- riots_new[(t+1):(t+p),23] 
IranNW_Retweets2 <- riots_new[(t+1):(t+p),24] 
malarab1_Favorites2 <- riots_new[(t+1):(t+p),25] 
NABEELRAJAB_Favorites2 <- riots_new[(t+1):(t+p),26] 
netanyahu_Favorites2 <- riots_new[(t+1):(t+p),27] 
netanyahu_Retweets2 <- riots_new[(t+1):(t+p),28] 
rouhani_Total_Posts2 <- riots_new[(t+1):(t+p),29] 
rouhani_Favorites2 <- riots_new[(t+1):(t+p),30] 
USEmbassyManama_Retweets2 <- riots_new[(t+1):(t+p),31] 
TEMP2 <- riots_new[(t+1):(t+p),32] 
DEWP2 <- riots_new[(t+1):(t+p),33] 
WDSP2 <- riots_new[(t+1):(t+p),34] 
PRCP2 <- riots_new[(t+1):(t+p),35] 
zinc_Open2 <- riots_new[(t+1):(t+p),36] 
zinc_Low2 <- riots_new[(t+1):(t+p),37] 
WTI_Close2 <- riots_new[(t+1):(t+p),38] 
WTI_Low2 <- riots_new[(t+1):(t+p),39] 
wheat_Open2 <- riots_new[(t+1):(t+p),40] 
wheat_High2 <- riots_new[(t+1):(t+p),41] 
wheat_Low2 <- riots_new[(t+1):(t+p),42] 
tin_Close2 <- riots_new[(t+1):(t+p),43] 
tin_High2 <- riots_new[(t+1):(t+p),44] 
tin_Low2 <- riots_new[(t+1):(t+p),45] 
sugar_Close2 <- riots_new[(t+1):(t+p),46] 
sugar_Open2 <- riots_new[(t+1):(t+p),47] 
sugar_Low2 <- riots_new[(t+1):(t+p),48] 
soybean_Close2 <- riots_new[(t+1):(t+p),49] 
soybean_Open2 <- riots_new[(t+1):(t+p),50] 
soybean_High2 <- riots_new[(t+1):(t+p),51] 
soybean_Low2 <- riots_new[(t+1):(t+p),52] 
silver_High2 <- riots_new[(t+1):(t+p),53] 
rice_Close2 <- riots_new[(t+1):(t+p),54] 
rice_High2 <- riots_new[(t+1):(t+p),55] 
platinum_Close2 <- riots_new[(t+1):(t+p),56] 
natural_gas_Close2 <- riots_new[(t+1):(t+p),57] 
monero_Close2 <- riots_new[(t+1):(t+p),58] 
monero_High2 <- riots_new[(t+1):(t+p),59] 
litecoin_Close2 <- riots_new[(t+1):(t+p),60] 
litecoin_Open2 <- riots_new[(t+1):(t+p),61] 
litecoin_Low2 <- riots_new[(t+1):(t+p),62] 
lead_High2 <- riots_new[(t+1):(t+p),63] 
lead_Low2 <- riots_new[(t+1):(t+p),64] 
Gold_Low2 <- riots_new[(t+1):(t+p),65] 
cotton_Low2 <- riots_new[(t+1):(t+p),66] 
corn_High2 <- riots_new[(t+1):(t+p),67] 
corn_Low2 <- riots_new[(t+1):(t+p),68] 
copper_Close2 <- riots_new[(t+1):(t+p),69] 
copper_High2 <- riots_new[(t+1):(t+p),70] 
coffee_Close2 <- riots_new[(t+1):(t+p),71] 
coffee_High2 <- riots_new[(t+1):(t+p),72] 
live_cattle_Close2 <- riots_new[(t+1):(t+p),73] 
live_cattle_High2 <- riots_new[(t+1):(t+p),74] 
live_cattle_Low2 <- riots_new[(t+1):(t+p),75] 
feed_cattle_Close2 <- riots_new[(t+1):(t+p),76] 
feed_cattle_Open2 <- riots_new[(t+1):(t+p),77] 
Brent_Close2 <- riots_new[(t+1):(t+p),78] 
Brent_Open2 <- riots_new[(t+1):(t+p),79] 
Bitcoin_Open2 <- riots_new[(t+1):(t+p),80] 
Bitcoin_Low2 <- riots_new[(t+1):(t+p),81] 
BAX_Open2 <- riots_new[(t+1):(t+p),82] 
BAX_High2 <- riots_new[(t+1):(t+p),83]
count2 <- riots_new[(t+1):(t+p), 84]

# Now define the target
count3 <- riots_new[(t+1+s):(t+p+s), 84]

test <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, count1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, count2, count3)

#summary(test)

## Applying RF
RandomForest <- randomForest(count3 ~ ., data=train, importance = TRUE, ntrees = nrow(train))

predRF <- predict(RandomForest, newdata = test, type = "response")
#summary(predRF)

### MSE
mse_PCA <- sum((predRF - test$count3)^2)/length(test$count3)

## MAPE
## Finding zero count
index <- which(test$count3 == 0)
## MAPE
mape_PCA <- sum(abs(predRF[-index] - test$count3[-index])/test$count3[-index])/length(test$count3[-index])
## [1] "MASE for the PCA target variable is 4.1481"
## [1] "MAPE for the PCA target variable is 0.3338"

First, we plot how it did with the train set. Black line is the truth and red line is the prediction.

plot(1:length(train$count3), train$count3, type="l", main = 'Rolling Horizon Bahrain PCA (01JAN16 - 31DEC19)', xlab = 'Date', ylab = 'Count')
lines(1:length(RandomForest$predicted), RandomForest$predicted, col = "red")
legend(900, 29, legend=c("True Count", "Rolling Hoizon/RF Forecast"),
       col=c("black", "red"),  lty=1:1, cex=0.8)

Now we plot how it did with the test set. Black line is the truth and red line is the prediction.

plot(1:length(test$count3), test$count3, type="l", ylim = c(0,3.5), main = 'Rolling Horizon Bahrain PCA (Test Set)', xlab = 'Date', ylab = 'Count')
lines(1:length(predRF), predRF, col = "red")
legend(1, 1, legend=c("True Count", "Rolling Hoizon/RF Forecast"),
       col=c("black", "red"),  lty=1:1, cex=0.8)

Rolling Horizon & Random Forest Design (Numerical Data)

RF.Pre <- list()
RF.MSE <- list()
RF.MAPE <- list()
RF.Pre[[1]] <- predRF
RF.MSE[[1]] <- mse_PCA
RF.MAPE[[1]] <- mape_PCA
k <- 1

#for(t in (i.train.sample + s + 1):(d[1]-p-s)){
for(t in (i.train.sample + s + 1):(i.train.sample + s + 11)){
  k <- k + 1
  ### Defining a train set
  # Define predictors
  ## For time t - 1
  al_wafa_Total_Posts1 <- riots_new[1:(t-1),1] 
  bahrain_moi_Total_Posts1 <- riots_new[1:(t-1),2] 
  BahrainRights_Favorites1 <- riots_new[1:(t-1),3] 
  BahrainRights_Retweets1 <- riots_new[1:(t-1),4] 
  BBCArabic_Total_Posts1 <- riots_new[1:(t-1),5] 
  BBCArabic_Favorites1 <- riots_new[1:(t-1),6] 
  BBCArabic_Retweets1 <- riots_new[1:(t-1),7] 
  bh14feb2011_Total_Posts1 <- riots_new[1:(t-1),8] 
  bh14feb2011_Favorites1 <- riots_new[1:(t-1),9] 
  bh14feb2011_Retweets1 <- riots_new[1:(t-1),10] 
  bna_ar_Total_Posts1 <- riots_new[1:(t-1),11] 
  bna_ar_Retweets1 <- riots_new[1:(t-1),12] 
  Coalition14_Total_Posts1 <- riots_new[1:(t-1),13] 
  Coalition14_Favorites1 <- riots_new[1:(t-1),14] 
  duraz_youth_Total_Posts1 <- riots_new[1:(t-1),15] 
  duraz_youth_Favorites1 <- riots_new[1:(t-1),16] 
  duraz_youth_Retweets1 <- riots_new[1:(t-1),17] 
  feb14revolution_Retweets1 <- riots_new[1:(t-1),18] 
  GDNonline_Total_Posts1 <- riots_new[1:(t-1),19] 
  GDNonline_Favorites1 <- riots_new[1:(t-1),20] 
  GDNonline_Retweets1 <- riots_new[1:(t-1),21] 
  Iran_Total_Posts1 <- riots_new[1:(t-1),22] 
  Iran_Favorites1 <- riots_new[1:(t-1),23] 
  IranNW_Retweets1 <- riots_new[1:(t-1),24] 
  malarab1_Favorites1 <- riots_new[1:(t-1),25] 
  NABEELRAJAB_Favorites1 <- riots_new[1:(t-1),26] 
  netanyahu_Favorites1 <- riots_new[1:(t-1),27] 
  netanyahu_Retweets1 <- riots_new[1:(t-1),28] 
  rouhani_Total_Posts1 <- riots_new[1:(t-1),29] 
  rouhani_Favorites1 <- riots_new[1:(t-1),30] 
  USEmbassyManama_Retweets1 <- riots_new[1:(t-1),31] 
  TEMP1 <- riots_new[1:(t-1),32] 
  DEWP1 <- riots_new[1:(t-1),33] 
  WDSP1 <- riots_new[1:(t-1),34] 
  PRCP1 <- riots_new[1:(t-1),35] 
  zinc_Open1 <- riots_new[1:(t-1),36] 
  zinc_Low1 <- riots_new[1:(t-1),37] 
  WTI_Close1 <- riots_new[1:(t-1),38] 
  WTI_Low1 <- riots_new[1:(t-1),39] 
  wheat_Open1 <- riots_new[1:(t-1),40] 
  wheat_High1 <- riots_new[1:(t-1),41] 
  wheat_Low1 <- riots_new[1:(t-1),42] 
  tin_Close1 <- riots_new[1:(t-1),43] 
  tin_High1 <- riots_new[1:(t-1),44] 
  tin_Low1 <- riots_new[1:(t-1),45] 
  sugar_Close1 <- riots_new[1:(t-1),46] 
  sugar_Open1 <- riots_new[1:(t-1),47] 
  sugar_Low1 <- riots_new[1:(t-1),48] 
  soybean_Close1 <- riots_new[1:(t-1),49] 
  soybean_Open1 <- riots_new[1:(t-1),50] 
  soybean_High1 <- riots_new[1:(t-1),51] 
  soybean_Low1 <- riots_new[1:(t-1),52] 
  silver_High1 <- riots_new[1:(t-1),53] 
  rice_Close1 <- riots_new[1:(t-1),54] 
  rice_High1 <- riots_new[1:(t-1),55] 
  platinum_Close1 <- riots_new[1:(t-1),56] 
  natural_gas_Close1 <- riots_new[1:(t-1),57] 
  monero_Close1 <- riots_new[1:(t-1),58] 
  monero_High1 <- riots_new[1:(t-1),59] 
  litecoin_Close1 <- riots_new[1:(t-1),60] 
  litecoin_Open1 <- riots_new[1:(t-1),61] 
  litecoin_Low1 <- riots_new[1:(t-1),62] 
  lead_High1 <- riots_new[1:(t-1),63] 
  lead_Low1 <- riots_new[1:(t-1),64] 
  Gold_Low1 <- riots_new[1:(t-1),65] 
  cotton_Low1 <- riots_new[1:(t-1),66] 
  corn_High1 <- riots_new[1:(t-1),67] 
  corn_Low1 <- riots_new[1:(t-1),68] 
  copper_Close1 <- riots_new[1:(t-1),69] 
  copper_High1 <- riots_new[1:(t-1),70] 
  coffee_Close1 <- riots_new[1:(t-1),71] 
  coffee_High1 <- riots_new[1:(t-1),72] 
  live_cattle_Close1 <- riots_new[1:(t-1),73] 
  live_cattle_High1 <- riots_new[1:(t-1),74] 
  live_cattle_Low1 <- riots_new[1:(t-1),75] 
  feed_cattle_Close1 <- riots_new[1:(t-1),76] 
  feed_cattle_Open1 <- riots_new[1:(t-1),77] 
  Brent_Close1 <- riots_new[1:(t-1),78] 
  Brent_Open1 <- riots_new[1:(t-1),79] 
  Bitcoin_Open1 <- riots_new[1:(t-1),80] 
  Bitcoin_Low1 <- riots_new[1:(t-1),81] 
  BAX_Open1 <- riots_new[1:(t-1),82] 
  BAX_High1 <- riots_new[1:(t-1),83] 
  pca_count1 <- riots_new[1:(t-1), 84]
  
  ## For time t
  al_wafa_Total_Posts2 <- riots_new[2:t,1] 
  bahrain_moi_Total_Posts2 <- riots_new[2:t,2] 
  BahrainRights_Favorites2 <- riots_new[2:t,3] 
  BahrainRights_Retweets2 <- riots_new[2:t,4] 
  BBCArabic_Total_Posts2 <- riots_new[2:t,5] 
  BBCArabic_Favorites2 <- riots_new[2:t,6] 
  BBCArabic_Retweets2 <- riots_new[2:t,7] 
  bh14feb2011_Total_Posts2 <- riots_new[2:t,8] 
  bh14feb2011_Favorites2 <- riots_new[2:t,9] 
  bh14feb2011_Retweets2 <- riots_new[2:t,10] 
  bna_ar_Total_Posts2 <- riots_new[2:t,11] 
  bna_ar_Retweets2 <- riots_new[2:t,12] 
  Coalition14_Total_Posts2 <- riots_new[2:t,13] 
  Coalition14_Favorites2 <- riots_new[2:t,14] 
  duraz_youth_Total_Posts2 <- riots_new[2:t,15] 
  duraz_youth_Favorites2 <- riots_new[2:t,16] 
  duraz_youth_Retweets2 <- riots_new[2:t,17] 
  feb14revolution_Retweets2 <- riots_new[2:t,18] 
  GDNonline_Total_Posts2 <- riots_new[2:t,19] 
  GDNonline_Favorites2 <- riots_new[2:t,20] 
  GDNonline_Retweets2 <- riots_new[2:t,21] 
  Iran_Total_Posts2 <- riots_new[2:t,22] 
  Iran_Favorites2 <- riots_new[2:t,23] 
  IranNW_Retweets2 <- riots_new[2:t,24] 
  malarab1_Favorites2 <- riots_new[2:t,25] 
  NABEELRAJAB_Favorites2 <- riots_new[2:t,26] 
  netanyahu_Favorites2 <- riots_new[2:t,27] 
  netanyahu_Retweets2 <- riots_new[2:t,28] 
  rouhani_Total_Posts2 <- riots_new[2:t,29] 
  rouhani_Favorites2 <- riots_new[2:t,30] 
  USEmbassyManama_Retweets2 <- riots_new[2:t,31] 
  TEMP2 <- riots_new[2:t,32] 
  DEWP2 <- riots_new[2:t,33] 
  WDSP2 <- riots_new[2:t,34] 
  PRCP2 <- riots_new[2:t,35] 
  zinc_Open2 <- riots_new[2:t,36] 
  zinc_Low2 <- riots_new[2:t,37] 
  WTI_Close2 <- riots_new[2:t,38] 
  WTI_Low2 <- riots_new[2:t,39] 
  wheat_Open2 <- riots_new[2:t,40] 
  wheat_High2 <- riots_new[2:t,41] 
  wheat_Low2 <- riots_new[2:t,42] 
  tin_Close2 <- riots_new[2:t,43] 
  tin_High2 <- riots_new[2:t,44] 
  tin_Low2 <- riots_new[2:t,45] 
  sugar_Close2 <- riots_new[2:t,46] 
  sugar_Open2 <- riots_new[2:t,47] 
  sugar_Low2 <- riots_new[2:t,48] 
  soybean_Close2 <- riots_new[2:t,49] 
  soybean_Open2 <- riots_new[2:t,50] 
  soybean_High2 <- riots_new[2:t,51] 
  soybean_Low2 <- riots_new[2:t,52] 
  silver_High2 <- riots_new[2:t,53] 
  rice_Close2 <- riots_new[2:t,54] 
  rice_High2 <- riots_new[2:t,55] 
  platinum_Close2 <- riots_new[2:t,56] 
  natural_gas_Close2 <- riots_new[2:t,57] 
  monero_Close2 <- riots_new[2:t,58] 
  monero_High2 <- riots_new[2:t,59] 
  litecoin_Close2 <- riots_new[2:t,60] 
  litecoin_Open2 <- riots_new[2:t,61] 
  litecoin_Low2 <- riots_new[2:t,62] 
  lead_High2 <- riots_new[2:t,63] 
  lead_Low2 <- riots_new[2:t,64] 
  Gold_Low2 <- riots_new[2:t,65] 
  cotton_Low2 <- riots_new[2:t,66] 
  corn_High2 <- riots_new[2:t,67] 
  corn_Low2 <- riots_new[2:t,68] 
  copper_Close2 <- riots_new[2:t,69] 
  copper_High2 <- riots_new[2:t,70] 
  coffee_Close2 <- riots_new[2:t,71] 
  coffee_High2 <- riots_new[2:t,72] 
  live_cattle_Close2 <- riots_new[2:t,73] 
  live_cattle_High2 <- riots_new[2:t,74] 
  live_cattle_Low2 <- riots_new[2:t,75] 
  feed_cattle_Close2 <- riots_new[2:t,76] 
  feed_cattle_Open2 <- riots_new[2:t,77] 
  Brent_Close2 <- riots_new[2:t,78] 
  Brent_Open2 <- riots_new[2:t,79] 
  Bitcoin_Open2 <- riots_new[2:t,80] 
  Bitcoin_Low2 <- riots_new[2:t,81] 
  BAX_Open2 <- riots_new[2:t,82] 
  BAX_High2 <- riots_new[2:t,83] 
  pca_count2 <- riots_new[2:t, 84]
  
  # Now define the target
  pca_count3 <- riots_new[(2+s):(t+s), 84]
  
  train <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, pca_count1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, pca_count2, pca_count3)
  
  #summary(train)
  
  ### Defining a test set
  # Define predictors
  ## For time t - 1
  al_wafa_Total_Posts1 <- riots_new[t:(t+p-1),1] 
  bahrain_moi_Total_Posts1 <- riots_new[t:(t+p-1),2] 
  BahrainRights_Favorites1 <- riots_new[t:(t+p-1),3] 
  BahrainRights_Retweets1 <- riots_new[t:(t+p-1),4] 
  BBCArabic_Total_Posts1 <- riots_new[t:(t+p-1),5] 
  BBCArabic_Favorites1 <- riots_new[t:(t+p-1),6] 
  BBCArabic_Retweets1 <- riots_new[t:(t+p-1),7] 
  bh14feb2011_Total_Posts1 <- riots_new[t:(t+p-1),8] 
  bh14feb2011_Favorites1 <- riots_new[t:(t+p-1),9] 
  bh14feb2011_Retweets1 <- riots_new[t:(t+p-1),10] 
  bna_ar_Total_Posts1 <- riots_new[t:(t+p-1),11] 
  bna_ar_Retweets1 <- riots_new[t:(t+p-1),12] 
  Coalition14_Total_Posts1 <- riots_new[t:(t+p-1),13] 
  Coalition14_Favorites1 <- riots_new[t:(t+p-1),14] 
  duraz_youth_Total_Posts1 <- riots_new[t:(t+p-1),15] 
  duraz_youth_Favorites1 <- riots_new[t:(t+p-1),16] 
  duraz_youth_Retweets1 <- riots_new[t:(t+p-1),17] 
  feb14revolution_Retweets1 <- riots_new[t:(t+p-1),18] 
  GDNonline_Total_Posts1 <- riots_new[t:(t+p-1),19] 
  GDNonline_Favorites1 <- riots_new[t:(t+p-1),20] 
  GDNonline_Retweets1 <- riots_new[t:(t+p-1),21] 
  Iran_Total_Posts1 <- riots_new[t:(t+p-1),22] 
  Iran_Favorites1 <- riots_new[t:(t+p-1),23] 
  IranNW_Retweets1 <- riots_new[t:(t+p-1),24] 
  malarab1_Favorites1 <- riots_new[t:(t+p-1),25] 
  NABEELRAJAB_Favorites1 <- riots_new[t:(t+p-1),26] 
  netanyahu_Favorites1 <- riots_new[t:(t+p-1),27] 
  netanyahu_Retweets1 <- riots_new[t:(t+p-1),28] 
  rouhani_Total_Posts1 <- riots_new[t:(t+p-1),29] 
  rouhani_Favorites1 <- riots_new[t:(t+p-1),30] 
  USEmbassyManama_Retweets1 <- riots_new[t:(t+p-1),31] 
  TEMP1 <- riots_new[t:(t+p-1),32] 
  DEWP1 <- riots_new[t:(t+p-1),33] 
  WDSP1 <- riots_new[t:(t+p-1),34] 
  PRCP1 <- riots_new[t:(t+p-1),35] 
  zinc_Open1 <- riots_new[t:(t+p-1),36] 
  zinc_Low1 <- riots_new[t:(t+p-1),37] 
  WTI_Close1 <- riots_new[t:(t+p-1),38] 
  WTI_Low1 <- riots_new[t:(t+p-1),39] 
  wheat_Open1 <- riots_new[t:(t+p-1),40] 
  wheat_High1 <- riots_new[t:(t+p-1),41] 
  wheat_Low1 <- riots_new[t:(t+p-1),42] 
  tin_Close1 <- riots_new[t:(t+p-1),43] 
  tin_High1 <- riots_new[t:(t+p-1),44] 
  tin_Low1 <- riots_new[t:(t+p-1),45] 
  sugar_Close1 <- riots_new[t:(t+p-1),46] 
  sugar_Open1 <- riots_new[t:(t+p-1),47] 
  sugar_Low1 <- riots_new[t:(t+p-1),48] 
  soybean_Close1 <- riots_new[t:(t+p-1),49] 
  soybean_Open1 <- riots_new[t:(t+p-1),50] 
  soybean_High1 <- riots_new[t:(t+p-1),51] 
  soybean_Low1 <- riots_new[t:(t+p-1),52] 
  silver_High1 <- riots_new[t:(t+p-1),53] 
  rice_Close1 <- riots_new[t:(t+p-1),54] 
  rice_High1 <- riots_new[t:(t+p-1),55] 
  platinum_Close1 <- riots_new[t:(t+p-1),56] 
  natural_gas_Close1 <- riots_new[t:(t+p-1),57] 
  monero_Close1 <- riots_new[t:(t+p-1),58] 
  monero_High1 <- riots_new[t:(t+p-1),59] 
  litecoin_Close1 <- riots_new[t:(t+p-1),60] 
  litecoin_Open1 <- riots_new[t:(t+p-1),61] 
  litecoin_Low1 <- riots_new[t:(t+p-1),62] 
  lead_High1 <- riots_new[t:(t+p-1),63] 
  lead_Low1 <- riots_new[t:(t+p-1),64] 
  Gold_Low1 <- riots_new[t:(t+p-1),65] 
  cotton_Low1 <- riots_new[t:(t+p-1),66] 
  corn_High1 <- riots_new[t:(t+p-1),67] 
  corn_Low1 <- riots_new[t:(t+p-1),68] 
  copper_Close1 <- riots_new[t:(t+p-1),69] 
  copper_High1 <- riots_new[t:(t+p-1),70] 
  coffee_Close1 <- riots_new[t:(t+p-1),71] 
  coffee_High1 <- riots_new[t:(t+p-1),72] 
  live_cattle_Close1 <- riots_new[t:(t+p-1),73] 
  live_cattle_High1 <- riots_new[t:(t+p-1),74] 
  live_cattle_Low1 <- riots_new[t:(t+p-1),75] 
  feed_cattle_Close1 <- riots_new[t:(t+p-1),76] 
  feed_cattle_Open1 <- riots_new[t:(t+p-1),77] 
  Brent_Close1 <- riots_new[t:(t+p-1),78] 
  Brent_Open1 <- riots_new[t:(t+p-1),79] 
  Bitcoin_Open1 <- riots_new[t:(t+p-1),80] 
  Bitcoin_Low1 <- riots_new[t:(t+p-1),81] 
  BAX_Open1 <- riots_new[t:(t+p-1),82] 
  BAX_High1 <- riots_new[t:(t+p-1),83] 
  pca_count1 <- riots_new[t:(t+p-1), 84]
  
  ## For time t
  al_wafa_Total_Posts2 <- riots_new[(t+1):(t+p),1] 
  bahrain_moi_Total_Posts2 <- riots_new[(t+1):(t+p),2] 
  BahrainRights_Favorites2 <- riots_new[(t+1):(t+p),3] 
  BahrainRights_Retweets2 <- riots_new[(t+1):(t+p),4] 
  BBCArabic_Total_Posts2 <- riots_new[(t+1):(t+p),5] 
  BBCArabic_Favorites2 <- riots_new[(t+1):(t+p),6] 
  BBCArabic_Retweets2 <- riots_new[(t+1):(t+p),7] 
  bh14feb2011_Total_Posts2 <- riots_new[(t+1):(t+p),8] 
  bh14feb2011_Favorites2 <- riots_new[(t+1):(t+p),9] 
  bh14feb2011_Retweets2 <- riots_new[(t+1):(t+p),10] 
  bna_ar_Total_Posts2 <- riots_new[(t+1):(t+p),11] 
  bna_ar_Retweets2 <- riots_new[(t+1):(t+p),12] 
  Coalition14_Total_Posts2 <- riots_new[(t+1):(t+p),13] 
  Coalition14_Favorites2 <- riots_new[(t+1):(t+p),14] 
  duraz_youth_Total_Posts2 <- riots_new[(t+1):(t+p),15] 
  duraz_youth_Favorites2 <- riots_new[(t+1):(t+p),16] 
  duraz_youth_Retweets2 <- riots_new[(t+1):(t+p),17] 
  feb14revolution_Retweets2 <- riots_new[(t+1):(t+p),18] 
  GDNonline_Total_Posts2 <- riots_new[(t+1):(t+p),19] 
  GDNonline_Favorites2 <- riots_new[(t+1):(t+p),20] 
  GDNonline_Retweets2 <- riots_new[(t+1):(t+p),21] 
  Iran_Total_Posts2 <- riots_new[(t+1):(t+p),22] 
  Iran_Favorites2 <- riots_new[(t+1):(t+p),23] 
  IranNW_Retweets2 <- riots_new[(t+1):(t+p),24] 
  malarab1_Favorites2 <- riots_new[(t+1):(t+p),25] 
  NABEELRAJAB_Favorites2 <- riots_new[(t+1):(t+p),26] 
  netanyahu_Favorites2 <- riots_new[(t+1):(t+p),27] 
  netanyahu_Retweets2 <- riots_new[(t+1):(t+p),28] 
  rouhani_Total_Posts2 <- riots_new[(t+1):(t+p),29] 
  rouhani_Favorites2 <- riots_new[(t+1):(t+p),30] 
  USEmbassyManama_Retweets2 <- riots_new[(t+1):(t+p),31] 
  TEMP2 <- riots_new[(t+1):(t+p),32] 
  DEWP2 <- riots_new[(t+1):(t+p),33] 
  WDSP2 <- riots_new[(t+1):(t+p),34] 
  PRCP2 <- riots_new[(t+1):(t+p),35] 
  zinc_Open2 <- riots_new[(t+1):(t+p),36] 
  zinc_Low2 <- riots_new[(t+1):(t+p),37] 
  WTI_Close2 <- riots_new[(t+1):(t+p),38] 
  WTI_Low2 <- riots_new[(t+1):(t+p),39] 
  wheat_Open2 <- riots_new[(t+1):(t+p),40] 
  wheat_High2 <- riots_new[(t+1):(t+p),41] 
  wheat_Low2 <- riots_new[(t+1):(t+p),42] 
  tin_Close2 <- riots_new[(t+1):(t+p),43] 
  tin_High2 <- riots_new[(t+1):(t+p),44] 
  tin_Low2 <- riots_new[(t+1):(t+p),45] 
  sugar_Close2 <- riots_new[(t+1):(t+p),46] 
  sugar_Open2 <- riots_new[(t+1):(t+p),47] 
  sugar_Low2 <- riots_new[(t+1):(t+p),48] 
  soybean_Close2 <- riots_new[(t+1):(t+p),49] 
  soybean_Open2 <- riots_new[(t+1):(t+p),50] 
  soybean_High2 <- riots_new[(t+1):(t+p),51] 
  soybean_Low2 <- riots_new[(t+1):(t+p),52] 
  silver_High2 <- riots_new[(t+1):(t+p),53] 
  rice_Close2 <- riots_new[(t+1):(t+p),54] 
  rice_High2 <- riots_new[(t+1):(t+p),55] 
  platinum_Close2 <- riots_new[(t+1):(t+p),56] 
  natural_gas_Close2 <- riots_new[(t+1):(t+p),57] 
  monero_Close2 <- riots_new[(t+1):(t+p),58] 
  monero_High2 <- riots_new[(t+1):(t+p),59] 
  litecoin_Close2 <- riots_new[(t+1):(t+p),60] 
  litecoin_Open2 <- riots_new[(t+1):(t+p),61] 
  litecoin_Low2 <- riots_new[(t+1):(t+p),62] 
  lead_High2 <- riots_new[(t+1):(t+p),63] 
  lead_Low2 <- riots_new[(t+1):(t+p),64] 
  Gold_Low2 <- riots_new[(t+1):(t+p),65] 
  cotton_Low2 <- riots_new[(t+1):(t+p),66] 
  corn_High2 <- riots_new[(t+1):(t+p),67] 
  corn_Low2 <- riots_new[(t+1):(t+p),68] 
  copper_Close2 <- riots_new[(t+1):(t+p),69] 
  copper_High2 <- riots_new[(t+1):(t+p),70] 
  coffee_Close2 <- riots_new[(t+1):(t+p),71] 
  coffee_High2 <- riots_new[(t+1):(t+p),72] 
  live_cattle_Close2 <- riots_new[(t+1):(t+p),73] 
  live_cattle_High2 <- riots_new[(t+1):(t+p),74] 
  live_cattle_Low2 <- riots_new[(t+1):(t+p),75] 
  feed_cattle_Close2 <- riots_new[(t+1):(t+p),76] 
  feed_cattle_Open2 <- riots_new[(t+1):(t+p),77] 
  Brent_Close2 <- riots_new[(t+1):(t+p),78] 
  Brent_Open2 <- riots_new[(t+1):(t+p),79] 
  Bitcoin_Open2 <- riots_new[(t+1):(t+p),80] 
  Bitcoin_Low2 <- riots_new[(t+1):(t+p),81] 
  BAX_Open2 <- riots_new[(t+1):(t+p),82] 
  BAX_High2 <- riots_new[(t+1):(t+p),83]
  pca_count2 <- riots_new[(t+1):(t+p), 84]
  
  # Now define the target
  pca_count3 <- riots_new[(t+1+s):(t+p+s), 84]
  
  test <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, pca_count1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, pca_count2, pca_count3)
  
  ## Applying RF

  RandomForest <- randomForest(pca_count3 ~ ., data=train, importance = TRUE, ntrees = 500)
  predRF <- predict(RandomForest, newdata = test, type = "response")
  summary(predRF)
  ## MSE
  mse <- sum((predRF - test$pca_count3)^2)/length(test$pca_count3)
  mse
  ## MAPE
  ## Finding zero riot_count
  index <- which(test$pca_count3 == 0)
  ## MAPE
  mape <- sum(abs(predRF[-index] - test$pca_count3[-index])/test$pca_ount3[-index])/length(test$pca_count3[-index])
  mape
  RF.MSE[[k]] <- mse
  RF.MAPE[[k]] <- mape
}

### Computing the overall MSE and MAPE
RF.MSE.Table <- do.call(rbind, RF.MSE)
RF.MAPE.Table <- do.call(rbind, RF.MAPE)

The overall MSE and MAPE are:

## [1] "The overall MASE is 6.9006"
## [1] "The overall MAPE is 0.0278"

First, we plot how it did with the train set. Black line is the truth and red line is the prediction.

plot(1:length(train$pca_count3), round(train$pca_count3), type="l", main = 'Rolling Horizon Bahrain PCA (01JAN16 - 15DEC19)', xlab = 'Date', ylab = 'Count')
lines(1:length(RandomForest$predicted), round(RandomForest$predicted), col = "red")
legend(900, 29, legend=c("True Count", "Rolling Hoizon/RF Forecast"),
       col=c("black", "red"),  lty=1:1, cex=0.8)

Now we plot how it did with the test set. Black line is the truth and red line is the prediction.

plot(1:length(test$pca_count3), test$pca_count3, type="l", ylim = c(0,5), main = 'Rolling Horizon Bahrain PCA (Test Set)', xlab = 'Date', ylab = 'Count')
lines(1:length(predRF), predRF, col = "red")
legend(4.5, 1, legend=c("True Count", "Rolling Hoizon/RF Forecast"),
       col=c("black", "red"),  lty=1:1, cex=0.8)

Despite the very low MAPE for the PCA model, our model performs poorly when forecasting on either the training set or the test set.

15 Random Forest (Categorical Data)

# Code sourced from: https://stackoverflow.com/questions/61955696/calculating-true-false-positive-and-true-false-negative-values-from-matrix-in-r

# Creates a dataframe and determines the true positive, true negative, false positive, and false negative statistics of a confusion matrix
multi_class_rates <- function(confusion_matrix) {
    true_positives  <- diag(confusion_matrix)
    false_positives <- colSums(confusion_matrix) - true_positives
    false_negatives <- rowSums(confusion_matrix) - true_positives
    true_negatives  <- sum(confusion_matrix) - true_positives -
        false_positives - false_negatives
    return(data.frame(true_positives, false_positives, true_negatives,
                      false_negatives, row.names = names(true_positives)))
}
# Read in the file
master_df <- read.csv('master_df.csv')

# Deletes the event date column
master_df$event_date <- NULL
# Code sourced from: https://stackoverflow.com/questions/40380112/categorize-continuous-variable-with-dplyr
# Establishes categorical boundaries of:
# very low risk: -Infinity (0) to 2
# low risk 3 to 6
# moderate risk 7 to 10
# high risk 11 to 16
# very high risk 17 to Infinity
master_df$category <- as.factor(as.character(cut(master_df$riot_count, breaks=c(-Inf, 1.9, 7.9, Inf), labels=c("Low_risk", "Moderate_risk", "High_risk"))))

# Determine the sum for each category
sum1 <- sum(master_df$category=='Low_risk')
sum2 <- sum(master_df$category=='Moderate_risk')
sum3 <- sum(master_df$category=='High_risk')

# Create a bar plot depicting the categories
xx <- barplot(c(sum1, sum2, sum3), ylim = c(0, 900), ylab = "Numerical Count", xlab = "Category",  col = c('darkgreen', 'lightgreen', 'yellow', 'red', 'darkred'), axes = TRUE, main = "Bar Plot for Categorical Data for Bahrain Riots (2016-2019)", names.arg = c("Low Risk", "Moderate Risk", "High Risk"))

# Code sourced from: https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783988785/6/ch06lvl1sec69/displaying-values-on-top-of-or-next-to-the-bars
# Create a matrix with the sum of each category
y<-as.matrix(c(sum1, sum2, sum3))

# Create a text block to depict the sum above each column on the bar plot
text(xx,y+50,labels=as.character(y))

# Partition of data set into 80% Train and 20% Test datasets
sampler <- sample(nrow(master_df),trunc(nrow(master_df)*.80))

# Subset the dataframe
riots_new <- master_df[,c(backward.adjr2.formula,"protest_count", "VAC_count", "total_fatalities", "category")]

# Create train and test splits
master_df.Train <- riots_new[sampler,]
master_df.Test <- riots_new[-sampler,]

# Apply random forests model having category target and rest as predictors
RandomForest <- randomForest(category ~ ., data=master_df.Train, importance = TRUE, ntrees = nrow(master_df.Train))

# Get the predicted category
predClassRF <- predict(RandomForest, newdata = master_df.Test, type = "response")

# Report confusion matrix from on the test dataset
confusionMatrix(predClassRF, master_df.Test$category)
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      High_risk Low_risk Moderate_risk
##   High_risk             4        1             2
##   Low_risk              0      103            45
##   Moderate_risk        10       42            86
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6587          
##                  95% CI : (0.6013, 0.7128)
##     No Information Rate : 0.4983          
##     P-Value [Acc > NIR] : 2.164e-08       
##                                           
##                   Kappa : 0.3601          
##                                           
##  Mcnemar's Test P-Value : 0.09219         
## 
## Statistics by Class:
## 
##                      Class: High_risk Class: Low_risk Class: Moderate_risk
## Sensitivity                   0.28571          0.7055               0.6466
## Specificity                   0.98925          0.6939               0.6750
## Pos Pred Value                0.57143          0.6959               0.6232
## Neg Pred Value                0.96503          0.7034               0.6968
## Prevalence                    0.04778          0.4983               0.4539
## Detection Rate                0.01365          0.3515               0.2935
## Detection Prevalence          0.02389          0.5051               0.4710
## Balanced Accuracy             0.63748          0.6997               0.6608
# Plot the results of random forest
plot(master_df.Test$category, predClassRF)

# Generate a summary of the Random Forest summary
summary(RandomForest)
##                 Length Class  Mode     
## call               5   -none- call     
## type               1   -none- character
## predicted       1168   factor numeric  
## err.rate        2000   -none- numeric  
## confusion         12   -none- numeric  
## votes           3504   matrix numeric  
## oob.times       1168   -none- numeric  
## classes            3   -none- character
## importance       430   -none- numeric  
## importanceSD     344   -none- numeric  
## localImportance    0   -none- NULL     
## proximity          0   -none- NULL     
## ntree              1   -none- numeric  
## mtry               1   -none- numeric  
## forest            14   -none- list     
## y               1168   factor numeric  
## test               0   -none- NULL     
## inbag              0   -none- NULL     
## terms              3   terms  call
# First: 1:(t-1)

i <- 1 # Initialize a counter
while(i <= length(riots_new)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(colnames(riots_new)[i],"1 <- riots_new[1:(t-1),", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts1 <- riots_new[1:(t-1),1] 
## bahrain_moi_Total_Posts1 <- riots_new[1:(t-1),2] 
## BahrainRights_Favorites1 <- riots_new[1:(t-1),3] 
## BahrainRights_Retweets1 <- riots_new[1:(t-1),4] 
## BBCArabic_Total_Posts1 <- riots_new[1:(t-1),5] 
## BBCArabic_Favorites1 <- riots_new[1:(t-1),6] 
## BBCArabic_Retweets1 <- riots_new[1:(t-1),7] 
## bh14feb2011_Total_Posts1 <- riots_new[1:(t-1),8] 
## bh14feb2011_Favorites1 <- riots_new[1:(t-1),9] 
## bh14feb2011_Retweets1 <- riots_new[1:(t-1),10] 
## bna_ar_Total_Posts1 <- riots_new[1:(t-1),11] 
## bna_ar_Retweets1 <- riots_new[1:(t-1),12] 
## Coalition14_Total_Posts1 <- riots_new[1:(t-1),13] 
## Coalition14_Favorites1 <- riots_new[1:(t-1),14] 
## duraz_youth_Total_Posts1 <- riots_new[1:(t-1),15] 
## duraz_youth_Favorites1 <- riots_new[1:(t-1),16] 
## duraz_youth_Retweets1 <- riots_new[1:(t-1),17] 
## feb14revolution_Retweets1 <- riots_new[1:(t-1),18] 
## GDNonline_Total_Posts1 <- riots_new[1:(t-1),19] 
## GDNonline_Favorites1 <- riots_new[1:(t-1),20] 
## GDNonline_Retweets1 <- riots_new[1:(t-1),21] 
## Iran_Total_Posts1 <- riots_new[1:(t-1),22] 
## Iran_Favorites1 <- riots_new[1:(t-1),23] 
## IranNW_Retweets1 <- riots_new[1:(t-1),24] 
## malarab1_Favorites1 <- riots_new[1:(t-1),25] 
## NABEELRAJAB_Favorites1 <- riots_new[1:(t-1),26] 
## netanyahu_Favorites1 <- riots_new[1:(t-1),27] 
## netanyahu_Retweets1 <- riots_new[1:(t-1),28] 
## rouhani_Total_Posts1 <- riots_new[1:(t-1),29] 
## rouhani_Favorites1 <- riots_new[1:(t-1),30] 
## USEmbassyManama_Retweets1 <- riots_new[1:(t-1),31] 
## TEMP1 <- riots_new[1:(t-1),32] 
## DEWP1 <- riots_new[1:(t-1),33] 
## WDSP1 <- riots_new[1:(t-1),34] 
## PRCP1 <- riots_new[1:(t-1),35] 
## zinc_Open1 <- riots_new[1:(t-1),36] 
## zinc_Low1 <- riots_new[1:(t-1),37] 
## WTI_Close1 <- riots_new[1:(t-1),38] 
## WTI_Low1 <- riots_new[1:(t-1),39] 
## wheat_Open1 <- riots_new[1:(t-1),40] 
## wheat_High1 <- riots_new[1:(t-1),41] 
## wheat_Low1 <- riots_new[1:(t-1),42] 
## tin_Close1 <- riots_new[1:(t-1),43] 
## tin_High1 <- riots_new[1:(t-1),44] 
## tin_Low1 <- riots_new[1:(t-1),45] 
## sugar_Close1 <- riots_new[1:(t-1),46] 
## sugar_Open1 <- riots_new[1:(t-1),47] 
## sugar_Low1 <- riots_new[1:(t-1),48] 
## soybean_Close1 <- riots_new[1:(t-1),49] 
## soybean_Open1 <- riots_new[1:(t-1),50] 
## soybean_High1 <- riots_new[1:(t-1),51] 
## soybean_Low1 <- riots_new[1:(t-1),52] 
## silver_High1 <- riots_new[1:(t-1),53] 
## rice_Close1 <- riots_new[1:(t-1),54] 
## rice_High1 <- riots_new[1:(t-1),55] 
## platinum_Close1 <- riots_new[1:(t-1),56] 
## natural_gas_Close1 <- riots_new[1:(t-1),57] 
## monero_Close1 <- riots_new[1:(t-1),58] 
## monero_High1 <- riots_new[1:(t-1),59] 
## litecoin_Close1 <- riots_new[1:(t-1),60] 
## litecoin_Open1 <- riots_new[1:(t-1),61] 
## litecoin_Low1 <- riots_new[1:(t-1),62] 
## lead_High1 <- riots_new[1:(t-1),63] 
## lead_Low1 <- riots_new[1:(t-1),64] 
## Gold_Low1 <- riots_new[1:(t-1),65] 
## cotton_Low1 <- riots_new[1:(t-1),66] 
## corn_High1 <- riots_new[1:(t-1),67] 
## corn_Low1 <- riots_new[1:(t-1),68] 
## copper_Close1 <- riots_new[1:(t-1),69] 
## copper_High1 <- riots_new[1:(t-1),70] 
## coffee_Close1 <- riots_new[1:(t-1),71] 
## coffee_High1 <- riots_new[1:(t-1),72] 
## live_cattle_Close1 <- riots_new[1:(t-1),73] 
## live_cattle_High1 <- riots_new[1:(t-1),74] 
## live_cattle_Low1 <- riots_new[1:(t-1),75] 
## feed_cattle_Close1 <- riots_new[1:(t-1),76] 
## feed_cattle_Open1 <- riots_new[1:(t-1),77] 
## Brent_Close1 <- riots_new[1:(t-1),78] 
## Brent_Open1 <- riots_new[1:(t-1),79] 
## Bitcoin_Open1 <- riots_new[1:(t-1),80] 
## Bitcoin_Low1 <- riots_new[1:(t-1),81] 
## BAX_Open1 <- riots_new[1:(t-1),82] 
## BAX_High1 <- riots_new[1:(t-1),83] 
## protest_count1 <- riots_new[1:(t-1),84] 
## VAC_count1 <- riots_new[1:(t-1),85] 
## total_fatalities1 <- riots_new[1:(t-1),86] 
## category1 <- riots_new[1:(t-1),87]
# Second: 2:t

i <- 1 # Initialize a counter
while(i <= length(riots_new)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(colnames(riots_new)[i],"2 <- riots_new[2:t,", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts2 <- riots_new[2:t,1] 
## bahrain_moi_Total_Posts2 <- riots_new[2:t,2] 
## BahrainRights_Favorites2 <- riots_new[2:t,3] 
## BahrainRights_Retweets2 <- riots_new[2:t,4] 
## BBCArabic_Total_Posts2 <- riots_new[2:t,5] 
## BBCArabic_Favorites2 <- riots_new[2:t,6] 
## BBCArabic_Retweets2 <- riots_new[2:t,7] 
## bh14feb2011_Total_Posts2 <- riots_new[2:t,8] 
## bh14feb2011_Favorites2 <- riots_new[2:t,9] 
## bh14feb2011_Retweets2 <- riots_new[2:t,10] 
## bna_ar_Total_Posts2 <- riots_new[2:t,11] 
## bna_ar_Retweets2 <- riots_new[2:t,12] 
## Coalition14_Total_Posts2 <- riots_new[2:t,13] 
## Coalition14_Favorites2 <- riots_new[2:t,14] 
## duraz_youth_Total_Posts2 <- riots_new[2:t,15] 
## duraz_youth_Favorites2 <- riots_new[2:t,16] 
## duraz_youth_Retweets2 <- riots_new[2:t,17] 
## feb14revolution_Retweets2 <- riots_new[2:t,18] 
## GDNonline_Total_Posts2 <- riots_new[2:t,19] 
## GDNonline_Favorites2 <- riots_new[2:t,20] 
## GDNonline_Retweets2 <- riots_new[2:t,21] 
## Iran_Total_Posts2 <- riots_new[2:t,22] 
## Iran_Favorites2 <- riots_new[2:t,23] 
## IranNW_Retweets2 <- riots_new[2:t,24] 
## malarab1_Favorites2 <- riots_new[2:t,25] 
## NABEELRAJAB_Favorites2 <- riots_new[2:t,26] 
## netanyahu_Favorites2 <- riots_new[2:t,27] 
## netanyahu_Retweets2 <- riots_new[2:t,28] 
## rouhani_Total_Posts2 <- riots_new[2:t,29] 
## rouhani_Favorites2 <- riots_new[2:t,30] 
## USEmbassyManama_Retweets2 <- riots_new[2:t,31] 
## TEMP2 <- riots_new[2:t,32] 
## DEWP2 <- riots_new[2:t,33] 
## WDSP2 <- riots_new[2:t,34] 
## PRCP2 <- riots_new[2:t,35] 
## zinc_Open2 <- riots_new[2:t,36] 
## zinc_Low2 <- riots_new[2:t,37] 
## WTI_Close2 <- riots_new[2:t,38] 
## WTI_Low2 <- riots_new[2:t,39] 
## wheat_Open2 <- riots_new[2:t,40] 
## wheat_High2 <- riots_new[2:t,41] 
## wheat_Low2 <- riots_new[2:t,42] 
## tin_Close2 <- riots_new[2:t,43] 
## tin_High2 <- riots_new[2:t,44] 
## tin_Low2 <- riots_new[2:t,45] 
## sugar_Close2 <- riots_new[2:t,46] 
## sugar_Open2 <- riots_new[2:t,47] 
## sugar_Low2 <- riots_new[2:t,48] 
## soybean_Close2 <- riots_new[2:t,49] 
## soybean_Open2 <- riots_new[2:t,50] 
## soybean_High2 <- riots_new[2:t,51] 
## soybean_Low2 <- riots_new[2:t,52] 
## silver_High2 <- riots_new[2:t,53] 
## rice_Close2 <- riots_new[2:t,54] 
## rice_High2 <- riots_new[2:t,55] 
## platinum_Close2 <- riots_new[2:t,56] 
## natural_gas_Close2 <- riots_new[2:t,57] 
## monero_Close2 <- riots_new[2:t,58] 
## monero_High2 <- riots_new[2:t,59] 
## litecoin_Close2 <- riots_new[2:t,60] 
## litecoin_Open2 <- riots_new[2:t,61] 
## litecoin_Low2 <- riots_new[2:t,62] 
## lead_High2 <- riots_new[2:t,63] 
## lead_Low2 <- riots_new[2:t,64] 
## Gold_Low2 <- riots_new[2:t,65] 
## cotton_Low2 <- riots_new[2:t,66] 
## corn_High2 <- riots_new[2:t,67] 
## corn_Low2 <- riots_new[2:t,68] 
## copper_Close2 <- riots_new[2:t,69] 
## copper_High2 <- riots_new[2:t,70] 
## coffee_Close2 <- riots_new[2:t,71] 
## coffee_High2 <- riots_new[2:t,72] 
## live_cattle_Close2 <- riots_new[2:t,73] 
## live_cattle_High2 <- riots_new[2:t,74] 
## live_cattle_Low2 <- riots_new[2:t,75] 
## feed_cattle_Close2 <- riots_new[2:t,76] 
## feed_cattle_Open2 <- riots_new[2:t,77] 
## Brent_Close2 <- riots_new[2:t,78] 
## Brent_Open2 <- riots_new[2:t,79] 
## Bitcoin_Open2 <- riots_new[2:t,80] 
## Bitcoin_Low2 <- riots_new[2:t,81] 
## BAX_Open2 <- riots_new[2:t,82] 
## BAX_High2 <- riots_new[2:t,83] 
## protest_count2 <- riots_new[2:t,84] 
## VAC_count2 <- riots_new[2:t,85] 
## total_fatalities2 <- riots_new[2:t,86] 
## category2 <- riots_new[2:t,87]
# Third: First part of training dataframe

i <- 1 # Initialize a counter
while(i <= length(riots_new)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(colnames(riots_new)[i],"1, "))
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, protest_count1, VAC_count1, total_fatalities1, category1,
# Fourth: Second part of training dataframe

i <- 1 # Initialize a counter
while(i <= length(riots_new)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(colnames(riots_new)[i],"2, "))
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, protest_count2, VAC_count2, total_fatalities2, category2,
# Fifth: t:(t+p-1)

i <- 1 # Initialize a counter
while(i <= length(riots_new)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(colnames(riots_new)[i],"1 <- riots_new[t:(t+p-1),", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts1 <- riots_new[t:(t+p-1),1] 
## bahrain_moi_Total_Posts1 <- riots_new[t:(t+p-1),2] 
## BahrainRights_Favorites1 <- riots_new[t:(t+p-1),3] 
## BahrainRights_Retweets1 <- riots_new[t:(t+p-1),4] 
## BBCArabic_Total_Posts1 <- riots_new[t:(t+p-1),5] 
## BBCArabic_Favorites1 <- riots_new[t:(t+p-1),6] 
## BBCArabic_Retweets1 <- riots_new[t:(t+p-1),7] 
## bh14feb2011_Total_Posts1 <- riots_new[t:(t+p-1),8] 
## bh14feb2011_Favorites1 <- riots_new[t:(t+p-1),9] 
## bh14feb2011_Retweets1 <- riots_new[t:(t+p-1),10] 
## bna_ar_Total_Posts1 <- riots_new[t:(t+p-1),11] 
## bna_ar_Retweets1 <- riots_new[t:(t+p-1),12] 
## Coalition14_Total_Posts1 <- riots_new[t:(t+p-1),13] 
## Coalition14_Favorites1 <- riots_new[t:(t+p-1),14] 
## duraz_youth_Total_Posts1 <- riots_new[t:(t+p-1),15] 
## duraz_youth_Favorites1 <- riots_new[t:(t+p-1),16] 
## duraz_youth_Retweets1 <- riots_new[t:(t+p-1),17] 
## feb14revolution_Retweets1 <- riots_new[t:(t+p-1),18] 
## GDNonline_Total_Posts1 <- riots_new[t:(t+p-1),19] 
## GDNonline_Favorites1 <- riots_new[t:(t+p-1),20] 
## GDNonline_Retweets1 <- riots_new[t:(t+p-1),21] 
## Iran_Total_Posts1 <- riots_new[t:(t+p-1),22] 
## Iran_Favorites1 <- riots_new[t:(t+p-1),23] 
## IranNW_Retweets1 <- riots_new[t:(t+p-1),24] 
## malarab1_Favorites1 <- riots_new[t:(t+p-1),25] 
## NABEELRAJAB_Favorites1 <- riots_new[t:(t+p-1),26] 
## netanyahu_Favorites1 <- riots_new[t:(t+p-1),27] 
## netanyahu_Retweets1 <- riots_new[t:(t+p-1),28] 
## rouhani_Total_Posts1 <- riots_new[t:(t+p-1),29] 
## rouhani_Favorites1 <- riots_new[t:(t+p-1),30] 
## USEmbassyManama_Retweets1 <- riots_new[t:(t+p-1),31] 
## TEMP1 <- riots_new[t:(t+p-1),32] 
## DEWP1 <- riots_new[t:(t+p-1),33] 
## WDSP1 <- riots_new[t:(t+p-1),34] 
## PRCP1 <- riots_new[t:(t+p-1),35] 
## zinc_Open1 <- riots_new[t:(t+p-1),36] 
## zinc_Low1 <- riots_new[t:(t+p-1),37] 
## WTI_Close1 <- riots_new[t:(t+p-1),38] 
## WTI_Low1 <- riots_new[t:(t+p-1),39] 
## wheat_Open1 <- riots_new[t:(t+p-1),40] 
## wheat_High1 <- riots_new[t:(t+p-1),41] 
## wheat_Low1 <- riots_new[t:(t+p-1),42] 
## tin_Close1 <- riots_new[t:(t+p-1),43] 
## tin_High1 <- riots_new[t:(t+p-1),44] 
## tin_Low1 <- riots_new[t:(t+p-1),45] 
## sugar_Close1 <- riots_new[t:(t+p-1),46] 
## sugar_Open1 <- riots_new[t:(t+p-1),47] 
## sugar_Low1 <- riots_new[t:(t+p-1),48] 
## soybean_Close1 <- riots_new[t:(t+p-1),49] 
## soybean_Open1 <- riots_new[t:(t+p-1),50] 
## soybean_High1 <- riots_new[t:(t+p-1),51] 
## soybean_Low1 <- riots_new[t:(t+p-1),52] 
## silver_High1 <- riots_new[t:(t+p-1),53] 
## rice_Close1 <- riots_new[t:(t+p-1),54] 
## rice_High1 <- riots_new[t:(t+p-1),55] 
## platinum_Close1 <- riots_new[t:(t+p-1),56] 
## natural_gas_Close1 <- riots_new[t:(t+p-1),57] 
## monero_Close1 <- riots_new[t:(t+p-1),58] 
## monero_High1 <- riots_new[t:(t+p-1),59] 
## litecoin_Close1 <- riots_new[t:(t+p-1),60] 
## litecoin_Open1 <- riots_new[t:(t+p-1),61] 
## litecoin_Low1 <- riots_new[t:(t+p-1),62] 
## lead_High1 <- riots_new[t:(t+p-1),63] 
## lead_Low1 <- riots_new[t:(t+p-1),64] 
## Gold_Low1 <- riots_new[t:(t+p-1),65] 
## cotton_Low1 <- riots_new[t:(t+p-1),66] 
## corn_High1 <- riots_new[t:(t+p-1),67] 
## corn_Low1 <- riots_new[t:(t+p-1),68] 
## copper_Close1 <- riots_new[t:(t+p-1),69] 
## copper_High1 <- riots_new[t:(t+p-1),70] 
## coffee_Close1 <- riots_new[t:(t+p-1),71] 
## coffee_High1 <- riots_new[t:(t+p-1),72] 
## live_cattle_Close1 <- riots_new[t:(t+p-1),73] 
## live_cattle_High1 <- riots_new[t:(t+p-1),74] 
## live_cattle_Low1 <- riots_new[t:(t+p-1),75] 
## feed_cattle_Close1 <- riots_new[t:(t+p-1),76] 
## feed_cattle_Open1 <- riots_new[t:(t+p-1),77] 
## Brent_Close1 <- riots_new[t:(t+p-1),78] 
## Brent_Open1 <- riots_new[t:(t+p-1),79] 
## Bitcoin_Open1 <- riots_new[t:(t+p-1),80] 
## Bitcoin_Low1 <- riots_new[t:(t+p-1),81] 
## BAX_Open1 <- riots_new[t:(t+p-1),82] 
## BAX_High1 <- riots_new[t:(t+p-1),83] 
## protest_count1 <- riots_new[t:(t+p-1),84] 
## VAC_count1 <- riots_new[t:(t+p-1),85] 
## total_fatalities1 <- riots_new[t:(t+p-1),86] 
## category1 <- riots_new[t:(t+p-1),87]
# Sixth: (t+1):(t+p)

i <- 1 # Initialize a counter
while(i <= length(riots_new)){ # Iterate over the length of the formula
  # Create print statements that are easily copy and pasted below
  cat(paste0(colnames(riots_new)[i],"2 <- riots_new[(t+1):(t+p),", i, "]"), "\n") 
  i = i + 1 # Increment a counter
}
## al_wafa_Total_Posts2 <- riots_new[(t+1):(t+p),1] 
## bahrain_moi_Total_Posts2 <- riots_new[(t+1):(t+p),2] 
## BahrainRights_Favorites2 <- riots_new[(t+1):(t+p),3] 
## BahrainRights_Retweets2 <- riots_new[(t+1):(t+p),4] 
## BBCArabic_Total_Posts2 <- riots_new[(t+1):(t+p),5] 
## BBCArabic_Favorites2 <- riots_new[(t+1):(t+p),6] 
## BBCArabic_Retweets2 <- riots_new[(t+1):(t+p),7] 
## bh14feb2011_Total_Posts2 <- riots_new[(t+1):(t+p),8] 
## bh14feb2011_Favorites2 <- riots_new[(t+1):(t+p),9] 
## bh14feb2011_Retweets2 <- riots_new[(t+1):(t+p),10] 
## bna_ar_Total_Posts2 <- riots_new[(t+1):(t+p),11] 
## bna_ar_Retweets2 <- riots_new[(t+1):(t+p),12] 
## Coalition14_Total_Posts2 <- riots_new[(t+1):(t+p),13] 
## Coalition14_Favorites2 <- riots_new[(t+1):(t+p),14] 
## duraz_youth_Total_Posts2 <- riots_new[(t+1):(t+p),15] 
## duraz_youth_Favorites2 <- riots_new[(t+1):(t+p),16] 
## duraz_youth_Retweets2 <- riots_new[(t+1):(t+p),17] 
## feb14revolution_Retweets2 <- riots_new[(t+1):(t+p),18] 
## GDNonline_Total_Posts2 <- riots_new[(t+1):(t+p),19] 
## GDNonline_Favorites2 <- riots_new[(t+1):(t+p),20] 
## GDNonline_Retweets2 <- riots_new[(t+1):(t+p),21] 
## Iran_Total_Posts2 <- riots_new[(t+1):(t+p),22] 
## Iran_Favorites2 <- riots_new[(t+1):(t+p),23] 
## IranNW_Retweets2 <- riots_new[(t+1):(t+p),24] 
## malarab1_Favorites2 <- riots_new[(t+1):(t+p),25] 
## NABEELRAJAB_Favorites2 <- riots_new[(t+1):(t+p),26] 
## netanyahu_Favorites2 <- riots_new[(t+1):(t+p),27] 
## netanyahu_Retweets2 <- riots_new[(t+1):(t+p),28] 
## rouhani_Total_Posts2 <- riots_new[(t+1):(t+p),29] 
## rouhani_Favorites2 <- riots_new[(t+1):(t+p),30] 
## USEmbassyManama_Retweets2 <- riots_new[(t+1):(t+p),31] 
## TEMP2 <- riots_new[(t+1):(t+p),32] 
## DEWP2 <- riots_new[(t+1):(t+p),33] 
## WDSP2 <- riots_new[(t+1):(t+p),34] 
## PRCP2 <- riots_new[(t+1):(t+p),35] 
## zinc_Open2 <- riots_new[(t+1):(t+p),36] 
## zinc_Low2 <- riots_new[(t+1):(t+p),37] 
## WTI_Close2 <- riots_new[(t+1):(t+p),38] 
## WTI_Low2 <- riots_new[(t+1):(t+p),39] 
## wheat_Open2 <- riots_new[(t+1):(t+p),40] 
## wheat_High2 <- riots_new[(t+1):(t+p),41] 
## wheat_Low2 <- riots_new[(t+1):(t+p),42] 
## tin_Close2 <- riots_new[(t+1):(t+p),43] 
## tin_High2 <- riots_new[(t+1):(t+p),44] 
## tin_Low2 <- riots_new[(t+1):(t+p),45] 
## sugar_Close2 <- riots_new[(t+1):(t+p),46] 
## sugar_Open2 <- riots_new[(t+1):(t+p),47] 
## sugar_Low2 <- riots_new[(t+1):(t+p),48] 
## soybean_Close2 <- riots_new[(t+1):(t+p),49] 
## soybean_Open2 <- riots_new[(t+1):(t+p),50] 
## soybean_High2 <- riots_new[(t+1):(t+p),51] 
## soybean_Low2 <- riots_new[(t+1):(t+p),52] 
## silver_High2 <- riots_new[(t+1):(t+p),53] 
## rice_Close2 <- riots_new[(t+1):(t+p),54] 
## rice_High2 <- riots_new[(t+1):(t+p),55] 
## platinum_Close2 <- riots_new[(t+1):(t+p),56] 
## natural_gas_Close2 <- riots_new[(t+1):(t+p),57] 
## monero_Close2 <- riots_new[(t+1):(t+p),58] 
## monero_High2 <- riots_new[(t+1):(t+p),59] 
## litecoin_Close2 <- riots_new[(t+1):(t+p),60] 
## litecoin_Open2 <- riots_new[(t+1):(t+p),61] 
## litecoin_Low2 <- riots_new[(t+1):(t+p),62] 
## lead_High2 <- riots_new[(t+1):(t+p),63] 
## lead_Low2 <- riots_new[(t+1):(t+p),64] 
## Gold_Low2 <- riots_new[(t+1):(t+p),65] 
## cotton_Low2 <- riots_new[(t+1):(t+p),66] 
## corn_High2 <- riots_new[(t+1):(t+p),67] 
## corn_Low2 <- riots_new[(t+1):(t+p),68] 
## copper_Close2 <- riots_new[(t+1):(t+p),69] 
## copper_High2 <- riots_new[(t+1):(t+p),70] 
## coffee_Close2 <- riots_new[(t+1):(t+p),71] 
## coffee_High2 <- riots_new[(t+1):(t+p),72] 
## live_cattle_Close2 <- riots_new[(t+1):(t+p),73] 
## live_cattle_High2 <- riots_new[(t+1):(t+p),74] 
## live_cattle_Low2 <- riots_new[(t+1):(t+p),75] 
## feed_cattle_Close2 <- riots_new[(t+1):(t+p),76] 
## feed_cattle_Open2 <- riots_new[(t+1):(t+p),77] 
## Brent_Close2 <- riots_new[(t+1):(t+p),78] 
## Brent_Open2 <- riots_new[(t+1):(t+p),79] 
## Bitcoin_Open2 <- riots_new[(t+1):(t+p),80] 
## Bitcoin_Low2 <- riots_new[(t+1):(t+p),81] 
## BAX_Open2 <- riots_new[(t+1):(t+p),82] 
## BAX_High2 <- riots_new[(t+1):(t+p),83] 
## protest_count2 <- riots_new[(t+1):(t+p),84] 
## VAC_count2 <- riots_new[(t+1):(t+p),85] 
## total_fatalities2 <- riots_new[(t+1):(t+p),86] 
## category2 <- riots_new[(t+1):(t+p),87]

Rolling Horizon & Random Forest Design (Categorical Data)

### Define s
s <- 1

### Initial train set sample size
i.train.sample <- 700

### One time period
p <- 31

### Define t
t <- i.train.sample + s

### Defining a train set
# Define predictors
## For time t - 1
al_wafa_Total_Posts1 <- riots_new[1:(t-1),1] 
bahrain_moi_Total_Posts1 <- riots_new[1:(t-1),2] 
BahrainRights_Favorites1 <- riots_new[1:(t-1),3] 
BahrainRights_Retweets1 <- riots_new[1:(t-1),4] 
BBCArabic_Total_Posts1 <- riots_new[1:(t-1),5] 
BBCArabic_Favorites1 <- riots_new[1:(t-1),6] 
BBCArabic_Retweets1 <- riots_new[1:(t-1),7] 
bh14feb2011_Total_Posts1 <- riots_new[1:(t-1),8] 
bh14feb2011_Favorites1 <- riots_new[1:(t-1),9] 
bh14feb2011_Retweets1 <- riots_new[1:(t-1),10] 
bna_ar_Total_Posts1 <- riots_new[1:(t-1),11] 
bna_ar_Retweets1 <- riots_new[1:(t-1),12] 
Coalition14_Total_Posts1 <- riots_new[1:(t-1),13] 
Coalition14_Favorites1 <- riots_new[1:(t-1),14] 
duraz_youth_Total_Posts1 <- riots_new[1:(t-1),15] 
duraz_youth_Favorites1 <- riots_new[1:(t-1),16] 
duraz_youth_Retweets1 <- riots_new[1:(t-1),17] 
feb14revolution_Retweets1 <- riots_new[1:(t-1),18] 
GDNonline_Total_Posts1 <- riots_new[1:(t-1),19] 
GDNonline_Favorites1 <- riots_new[1:(t-1),20] 
GDNonline_Retweets1 <- riots_new[1:(t-1),21] 
Iran_Total_Posts1 <- riots_new[1:(t-1),22] 
Iran_Favorites1 <- riots_new[1:(t-1),23] 
IranNW_Retweets1 <- riots_new[1:(t-1),24] 
malarab1_Favorites1 <- riots_new[1:(t-1),25] 
NABEELRAJAB_Favorites1 <- riots_new[1:(t-1),26] 
netanyahu_Favorites1 <- riots_new[1:(t-1),27] 
netanyahu_Retweets1 <- riots_new[1:(t-1),28] 
rouhani_Total_Posts1 <- riots_new[1:(t-1),29] 
rouhani_Favorites1 <- riots_new[1:(t-1),30] 
USEmbassyManama_Retweets1 <- riots_new[1:(t-1),31] 
TEMP1 <- riots_new[1:(t-1),32] 
DEWP1 <- riots_new[1:(t-1),33] 
WDSP1 <- riots_new[1:(t-1),34] 
PRCP1 <- riots_new[1:(t-1),35] 
zinc_Open1 <- riots_new[1:(t-1),36] 
zinc_Low1 <- riots_new[1:(t-1),37] 
WTI_Close1 <- riots_new[1:(t-1),38] 
WTI_Low1 <- riots_new[1:(t-1),39] 
wheat_Open1 <- riots_new[1:(t-1),40] 
wheat_High1 <- riots_new[1:(t-1),41] 
wheat_Low1 <- riots_new[1:(t-1),42] 
tin_Close1 <- riots_new[1:(t-1),43] 
tin_High1 <- riots_new[1:(t-1),44] 
tin_Low1 <- riots_new[1:(t-1),45] 
sugar_Close1 <- riots_new[1:(t-1),46] 
sugar_Open1 <- riots_new[1:(t-1),47] 
sugar_Low1 <- riots_new[1:(t-1),48] 
soybean_Close1 <- riots_new[1:(t-1),49] 
soybean_Open1 <- riots_new[1:(t-1),50] 
soybean_High1 <- riots_new[1:(t-1),51] 
soybean_Low1 <- riots_new[1:(t-1),52] 
silver_High1 <- riots_new[1:(t-1),53] 
rice_Close1 <- riots_new[1:(t-1),54] 
rice_High1 <- riots_new[1:(t-1),55] 
platinum_Close1 <- riots_new[1:(t-1),56] 
natural_gas_Close1 <- riots_new[1:(t-1),57] 
monero_Close1 <- riots_new[1:(t-1),58] 
monero_High1 <- riots_new[1:(t-1),59] 
litecoin_Close1 <- riots_new[1:(t-1),60] 
litecoin_Open1 <- riots_new[1:(t-1),61] 
litecoin_Low1 <- riots_new[1:(t-1),62] 
lead_High1 <- riots_new[1:(t-1),63] 
lead_Low1 <- riots_new[1:(t-1),64] 
Gold_Low1 <- riots_new[1:(t-1),65] 
cotton_Low1 <- riots_new[1:(t-1),66] 
corn_High1 <- riots_new[1:(t-1),67] 
corn_Low1 <- riots_new[1:(t-1),68] 
copper_Close1 <- riots_new[1:(t-1),69] 
copper_High1 <- riots_new[1:(t-1),70] 
coffee_Close1 <- riots_new[1:(t-1),71] 
coffee_High1 <- riots_new[1:(t-1),72] 
live_cattle_Close1 <- riots_new[1:(t-1),73] 
live_cattle_High1 <- riots_new[1:(t-1),74] 
live_cattle_Low1 <- riots_new[1:(t-1),75] 
feed_cattle_Close1 <- riots_new[1:(t-1),76] 
feed_cattle_Open1 <- riots_new[1:(t-1),77] 
Brent_Close1 <- riots_new[1:(t-1),78] 
Brent_Open1 <- riots_new[1:(t-1),79] 
Bitcoin_Open1 <- riots_new[1:(t-1),80] 
Bitcoin_Low1 <- riots_new[1:(t-1),81] 
BAX_Open1 <- riots_new[1:(t-1),82] 
BAX_High1 <- riots_new[1:(t-1),83] 
protest_count1 <- riots_new[1:(t-1),84] 
VAC_count1 <- riots_new[1:(t-1),85] 
total_fatalities1 <- riots_new[1:(t-1),86] 
category1 <- riots_new[1:(t-1),87] 

## For time t
al_wafa_Total_Posts2 <- riots_new[2:t,1] 
bahrain_moi_Total_Posts2 <- riots_new[2:t,2] 
BahrainRights_Favorites2 <- riots_new[2:t,3] 
BahrainRights_Retweets2 <- riots_new[2:t,4] 
BBCArabic_Total_Posts2 <- riots_new[2:t,5] 
BBCArabic_Favorites2 <- riots_new[2:t,6] 
BBCArabic_Retweets2 <- riots_new[2:t,7] 
bh14feb2011_Total_Posts2 <- riots_new[2:t,8] 
bh14feb2011_Favorites2 <- riots_new[2:t,9] 
bh14feb2011_Retweets2 <- riots_new[2:t,10] 
bna_ar_Total_Posts2 <- riots_new[2:t,11] 
bna_ar_Retweets2 <- riots_new[2:t,12] 
Coalition14_Total_Posts2 <- riots_new[2:t,13] 
Coalition14_Favorites2 <- riots_new[2:t,14] 
duraz_youth_Total_Posts2 <- riots_new[2:t,15] 
duraz_youth_Favorites2 <- riots_new[2:t,16] 
duraz_youth_Retweets2 <- riots_new[2:t,17] 
feb14revolution_Retweets2 <- riots_new[2:t,18] 
GDNonline_Total_Posts2 <- riots_new[2:t,19] 
GDNonline_Favorites2 <- riots_new[2:t,20] 
GDNonline_Retweets2 <- riots_new[2:t,21] 
Iran_Total_Posts2 <- riots_new[2:t,22] 
Iran_Favorites2 <- riots_new[2:t,23] 
IranNW_Retweets2 <- riots_new[2:t,24] 
malarab1_Favorites2 <- riots_new[2:t,25] 
NABEELRAJAB_Favorites2 <- riots_new[2:t,26] 
netanyahu_Favorites2 <- riots_new[2:t,27] 
netanyahu_Retweets2 <- riots_new[2:t,28] 
rouhani_Total_Posts2 <- riots_new[2:t,29] 
rouhani_Favorites2 <- riots_new[2:t,30] 
USEmbassyManama_Retweets2 <- riots_new[2:t,31] 
TEMP2 <- riots_new[2:t,32] 
DEWP2 <- riots_new[2:t,33] 
WDSP2 <- riots_new[2:t,34] 
PRCP2 <- riots_new[2:t,35] 
zinc_Open2 <- riots_new[2:t,36] 
zinc_Low2 <- riots_new[2:t,37] 
WTI_Close2 <- riots_new[2:t,38] 
WTI_Low2 <- riots_new[2:t,39] 
wheat_Open2 <- riots_new[2:t,40] 
wheat_High2 <- riots_new[2:t,41] 
wheat_Low2 <- riots_new[2:t,42] 
tin_Close2 <- riots_new[2:t,43] 
tin_High2 <- riots_new[2:t,44] 
tin_Low2 <- riots_new[2:t,45] 
sugar_Close2 <- riots_new[2:t,46] 
sugar_Open2 <- riots_new[2:t,47] 
sugar_Low2 <- riots_new[2:t,48] 
soybean_Close2 <- riots_new[2:t,49] 
soybean_Open2 <- riots_new[2:t,50] 
soybean_High2 <- riots_new[2:t,51] 
soybean_Low2 <- riots_new[2:t,52] 
silver_High2 <- riots_new[2:t,53] 
rice_Close2 <- riots_new[2:t,54] 
rice_High2 <- riots_new[2:t,55] 
platinum_Close2 <- riots_new[2:t,56] 
natural_gas_Close2 <- riots_new[2:t,57] 
monero_Close2 <- riots_new[2:t,58] 
monero_High2 <- riots_new[2:t,59] 
litecoin_Close2 <- riots_new[2:t,60] 
litecoin_Open2 <- riots_new[2:t,61] 
litecoin_Low2 <- riots_new[2:t,62] 
lead_High2 <- riots_new[2:t,63] 
lead_Low2 <- riots_new[2:t,64] 
Gold_Low2 <- riots_new[2:t,65] 
cotton_Low2 <- riots_new[2:t,66] 
corn_High2 <- riots_new[2:t,67] 
corn_Low2 <- riots_new[2:t,68] 
copper_Close2 <- riots_new[2:t,69] 
copper_High2 <- riots_new[2:t,70] 
coffee_Close2 <- riots_new[2:t,71] 
coffee_High2 <- riots_new[2:t,72] 
live_cattle_Close2 <- riots_new[2:t,73] 
live_cattle_High2 <- riots_new[2:t,74] 
live_cattle_Low2 <- riots_new[2:t,75] 
feed_cattle_Close2 <- riots_new[2:t,76] 
feed_cattle_Open2 <- riots_new[2:t,77] 
Brent_Close2 <- riots_new[2:t,78] 
Brent_Open2 <- riots_new[2:t,79] 
Bitcoin_Open2 <- riots_new[2:t,80] 
Bitcoin_Low2 <- riots_new[2:t,81] 
BAX_Open2 <- riots_new[2:t,82] 
BAX_High2 <- riots_new[2:t,83] 
protest_count2 <- riots_new[2:t,84] 
VAC_count2 <- riots_new[2:t,85] 
total_fatalities2 <- riots_new[2:t,86] 
category2 <- riots_new[2:t,87]

# Now define the target
category3 <- riots_new[(2+s):(t+s), 87]

train <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, protest_count1, VAC_count1, total_fatalities1, category1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, protest_count2, VAC_count2, total_fatalities2, category2, category3)

#summary(train)

### Defining a test set
# Define predictors
## For time t - 1
al_wafa_Total_Posts1 <- riots_new[t:(t+p-1),1] 
bahrain_moi_Total_Posts1 <- riots_new[t:(t+p-1),2] 
BahrainRights_Favorites1 <- riots_new[t:(t+p-1),3] 
BahrainRights_Retweets1 <- riots_new[t:(t+p-1),4] 
BBCArabic_Total_Posts1 <- riots_new[t:(t+p-1),5] 
BBCArabic_Favorites1 <- riots_new[t:(t+p-1),6] 
BBCArabic_Retweets1 <- riots_new[t:(t+p-1),7] 
bh14feb2011_Total_Posts1 <- riots_new[t:(t+p-1),8] 
bh14feb2011_Favorites1 <- riots_new[t:(t+p-1),9] 
bh14feb2011_Retweets1 <- riots_new[t:(t+p-1),10] 
bna_ar_Total_Posts1 <- riots_new[t:(t+p-1),11] 
bna_ar_Retweets1 <- riots_new[t:(t+p-1),12] 
Coalition14_Total_Posts1 <- riots_new[t:(t+p-1),13] 
Coalition14_Favorites1 <- riots_new[t:(t+p-1),14] 
duraz_youth_Total_Posts1 <- riots_new[t:(t+p-1),15] 
duraz_youth_Favorites1 <- riots_new[t:(t+p-1),16] 
duraz_youth_Retweets1 <- riots_new[t:(t+p-1),17] 
feb14revolution_Retweets1 <- riots_new[t:(t+p-1),18] 
GDNonline_Total_Posts1 <- riots_new[t:(t+p-1),19] 
GDNonline_Favorites1 <- riots_new[t:(t+p-1),20] 
GDNonline_Retweets1 <- riots_new[t:(t+p-1),21] 
Iran_Total_Posts1 <- riots_new[t:(t+p-1),22] 
Iran_Favorites1 <- riots_new[t:(t+p-1),23] 
IranNW_Retweets1 <- riots_new[t:(t+p-1),24] 
malarab1_Favorites1 <- riots_new[t:(t+p-1),25] 
NABEELRAJAB_Favorites1 <- riots_new[t:(t+p-1),26] 
netanyahu_Favorites1 <- riots_new[t:(t+p-1),27] 
netanyahu_Retweets1 <- riots_new[t:(t+p-1),28] 
rouhani_Total_Posts1 <- riots_new[t:(t+p-1),29] 
rouhani_Favorites1 <- riots_new[t:(t+p-1),30] 
USEmbassyManama_Retweets1 <- riots_new[t:(t+p-1),31] 
TEMP1 <- riots_new[t:(t+p-1),32] 
DEWP1 <- riots_new[t:(t+p-1),33] 
WDSP1 <- riots_new[t:(t+p-1),34] 
PRCP1 <- riots_new[t:(t+p-1),35] 
zinc_Open1 <- riots_new[t:(t+p-1),36] 
zinc_Low1 <- riots_new[t:(t+p-1),37] 
WTI_Close1 <- riots_new[t:(t+p-1),38] 
WTI_Low1 <- riots_new[t:(t+p-1),39] 
wheat_Open1 <- riots_new[t:(t+p-1),40] 
wheat_High1 <- riots_new[t:(t+p-1),41] 
wheat_Low1 <- riots_new[t:(t+p-1),42] 
tin_Close1 <- riots_new[t:(t+p-1),43] 
tin_High1 <- riots_new[t:(t+p-1),44] 
tin_Low1 <- riots_new[t:(t+p-1),45] 
sugar_Close1 <- riots_new[t:(t+p-1),46] 
sugar_Open1 <- riots_new[t:(t+p-1),47] 
sugar_Low1 <- riots_new[t:(t+p-1),48] 
soybean_Close1 <- riots_new[t:(t+p-1),49] 
soybean_Open1 <- riots_new[t:(t+p-1),50] 
soybean_High1 <- riots_new[t:(t+p-1),51] 
soybean_Low1 <- riots_new[t:(t+p-1),52] 
silver_High1 <- riots_new[t:(t+p-1),53] 
rice_Close1 <- riots_new[t:(t+p-1),54] 
rice_High1 <- riots_new[t:(t+p-1),55] 
platinum_Close1 <- riots_new[t:(t+p-1),56] 
natural_gas_Close1 <- riots_new[t:(t+p-1),57] 
monero_Close1 <- riots_new[t:(t+p-1),58] 
monero_High1 <- riots_new[t:(t+p-1),59] 
litecoin_Close1 <- riots_new[t:(t+p-1),60] 
litecoin_Open1 <- riots_new[t:(t+p-1),61] 
litecoin_Low1 <- riots_new[t:(t+p-1),62] 
lead_High1 <- riots_new[t:(t+p-1),63] 
lead_Low1 <- riots_new[t:(t+p-1),64] 
Gold_Low1 <- riots_new[t:(t+p-1),65] 
cotton_Low1 <- riots_new[t:(t+p-1),66] 
corn_High1 <- riots_new[t:(t+p-1),67] 
corn_Low1 <- riots_new[t:(t+p-1),68] 
copper_Close1 <- riots_new[t:(t+p-1),69] 
copper_High1 <- riots_new[t:(t+p-1),70] 
coffee_Close1 <- riots_new[t:(t+p-1),71] 
coffee_High1 <- riots_new[t:(t+p-1),72] 
live_cattle_Close1 <- riots_new[t:(t+p-1),73] 
live_cattle_High1 <- riots_new[t:(t+p-1),74] 
live_cattle_Low1 <- riots_new[t:(t+p-1),75] 
feed_cattle_Close1 <- riots_new[t:(t+p-1),76] 
feed_cattle_Open1 <- riots_new[t:(t+p-1),77] 
Brent_Close1 <- riots_new[t:(t+p-1),78] 
Brent_Open1 <- riots_new[t:(t+p-1),79] 
Bitcoin_Open1 <- riots_new[t:(t+p-1),80] 
Bitcoin_Low1 <- riots_new[t:(t+p-1),81] 
BAX_Open1 <- riots_new[t:(t+p-1),82] 
BAX_High1 <- riots_new[t:(t+p-1),83] 
protest_count1 <- riots_new[t:(t+p-1),84] 
VAC_count1 <- riots_new[t:(t+p-1),85] 
total_fatalities1 <- riots_new[t:(t+p-1),86] 
category1 <- riots_new[t:(t+p-1),87]

## For time t
al_wafa_Total_Posts2 <- riots_new[(t+1):(t+p),1] 
bahrain_moi_Total_Posts2 <- riots_new[(t+1):(t+p),2] 
BahrainRights_Favorites2 <- riots_new[(t+1):(t+p),3] 
BahrainRights_Retweets2 <- riots_new[(t+1):(t+p),4] 
BBCArabic_Total_Posts2 <- riots_new[(t+1):(t+p),5] 
BBCArabic_Favorites2 <- riots_new[(t+1):(t+p),6] 
BBCArabic_Retweets2 <- riots_new[(t+1):(t+p),7] 
bh14feb2011_Total_Posts2 <- riots_new[(t+1):(t+p),8] 
bh14feb2011_Favorites2 <- riots_new[(t+1):(t+p),9] 
bh14feb2011_Retweets2 <- riots_new[(t+1):(t+p),10] 
bna_ar_Total_Posts2 <- riots_new[(t+1):(t+p),11] 
bna_ar_Retweets2 <- riots_new[(t+1):(t+p),12] 
Coalition14_Total_Posts2 <- riots_new[(t+1):(t+p),13] 
Coalition14_Favorites2 <- riots_new[(t+1):(t+p),14] 
duraz_youth_Total_Posts2 <- riots_new[(t+1):(t+p),15] 
duraz_youth_Favorites2 <- riots_new[(t+1):(t+p),16] 
duraz_youth_Retweets2 <- riots_new[(t+1):(t+p),17] 
feb14revolution_Retweets2 <- riots_new[(t+1):(t+p),18] 
GDNonline_Total_Posts2 <- riots_new[(t+1):(t+p),19] 
GDNonline_Favorites2 <- riots_new[(t+1):(t+p),20] 
GDNonline_Retweets2 <- riots_new[(t+1):(t+p),21] 
Iran_Total_Posts2 <- riots_new[(t+1):(t+p),22] 
Iran_Favorites2 <- riots_new[(t+1):(t+p),23] 
IranNW_Retweets2 <- riots_new[(t+1):(t+p),24] 
malarab1_Favorites2 <- riots_new[(t+1):(t+p),25] 
NABEELRAJAB_Favorites2 <- riots_new[(t+1):(t+p),26] 
netanyahu_Favorites2 <- riots_new[(t+1):(t+p),27] 
netanyahu_Retweets2 <- riots_new[(t+1):(t+p),28] 
rouhani_Total_Posts2 <- riots_new[(t+1):(t+p),29] 
rouhani_Favorites2 <- riots_new[(t+1):(t+p),30] 
USEmbassyManama_Retweets2 <- riots_new[(t+1):(t+p),31] 
TEMP2 <- riots_new[(t+1):(t+p),32] 
DEWP2 <- riots_new[(t+1):(t+p),33] 
WDSP2 <- riots_new[(t+1):(t+p),34] 
PRCP2 <- riots_new[(t+1):(t+p),35] 
zinc_Open2 <- riots_new[(t+1):(t+p),36] 
zinc_Low2 <- riots_new[(t+1):(t+p),37] 
WTI_Close2 <- riots_new[(t+1):(t+p),38] 
WTI_Low2 <- riots_new[(t+1):(t+p),39] 
wheat_Open2 <- riots_new[(t+1):(t+p),40] 
wheat_High2 <- riots_new[(t+1):(t+p),41] 
wheat_Low2 <- riots_new[(t+1):(t+p),42] 
tin_Close2 <- riots_new[(t+1):(t+p),43] 
tin_High2 <- riots_new[(t+1):(t+p),44] 
tin_Low2 <- riots_new[(t+1):(t+p),45] 
sugar_Close2 <- riots_new[(t+1):(t+p),46] 
sugar_Open2 <- riots_new[(t+1):(t+p),47] 
sugar_Low2 <- riots_new[(t+1):(t+p),48] 
soybean_Close2 <- riots_new[(t+1):(t+p),49] 
soybean_Open2 <- riots_new[(t+1):(t+p),50] 
soybean_High2 <- riots_new[(t+1):(t+p),51] 
soybean_Low2 <- riots_new[(t+1):(t+p),52] 
silver_High2 <- riots_new[(t+1):(t+p),53] 
rice_Close2 <- riots_new[(t+1):(t+p),54] 
rice_High2 <- riots_new[(t+1):(t+p),55] 
platinum_Close2 <- riots_new[(t+1):(t+p),56] 
natural_gas_Close2 <- riots_new[(t+1):(t+p),57] 
monero_Close2 <- riots_new[(t+1):(t+p),58] 
monero_High2 <- riots_new[(t+1):(t+p),59] 
litecoin_Close2 <- riots_new[(t+1):(t+p),60] 
litecoin_Open2 <- riots_new[(t+1):(t+p),61] 
litecoin_Low2 <- riots_new[(t+1):(t+p),62] 
lead_High2 <- riots_new[(t+1):(t+p),63] 
lead_Low2 <- riots_new[(t+1):(t+p),64] 
Gold_Low2 <- riots_new[(t+1):(t+p),65] 
cotton_Low2 <- riots_new[(t+1):(t+p),66] 
corn_High2 <- riots_new[(t+1):(t+p),67] 
corn_Low2 <- riots_new[(t+1):(t+p),68] 
copper_Close2 <- riots_new[(t+1):(t+p),69] 
copper_High2 <- riots_new[(t+1):(t+p),70] 
coffee_Close2 <- riots_new[(t+1):(t+p),71] 
coffee_High2 <- riots_new[(t+1):(t+p),72] 
live_cattle_Close2 <- riots_new[(t+1):(t+p),73] 
live_cattle_High2 <- riots_new[(t+1):(t+p),74] 
live_cattle_Low2 <- riots_new[(t+1):(t+p),75] 
feed_cattle_Close2 <- riots_new[(t+1):(t+p),76] 
feed_cattle_Open2 <- riots_new[(t+1):(t+p),77] 
Brent_Close2 <- riots_new[(t+1):(t+p),78] 
Brent_Open2 <- riots_new[(t+1):(t+p),79] 
Bitcoin_Open2 <- riots_new[(t+1):(t+p),80] 
Bitcoin_Low2 <- riots_new[(t+1):(t+p),81] 
BAX_Open2 <- riots_new[(t+1):(t+p),82] 
BAX_High2 <- riots_new[(t+1):(t+p),83] 
protest_count2 <- riots_new[(t+1):(t+p),84] 
VAC_count2 <- riots_new[(t+1):(t+p),85] 
total_fatalities2 <- riots_new[(t+1):(t+p),86] 
category2 <- riots_new[(t+1):(t+p),87]

# Now define the target
category3 <- riots_new[(t+1+s):(t+p+s), 87]

test <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, protest_count1, VAC_count1, total_fatalities1, category1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, protest_count2, VAC_count2, total_fatalities2, category2, category3)

#summary(test)

## Applying RF
RandomForest <- randomForest(category3 ~ ., data=train, importance = TRUE, ntrees = nrow(train), weights = 1000/table(train$category))

predRF <- predict(RandomForest, newdata = test, type = "response")
# Get the predicted category
predClassRF <- predict(RandomForest, newdata = test, type = "response")

# Report confusion matrix from on the test dataset
confusionMatrix(predClassRF, test$category3)
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      High_risk Low_risk Moderate_risk
##   High_risk             0        0             0
##   Low_risk              0        0             0
##   Moderate_risk         1        5            25
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8065          
##                  95% CI : (0.6253, 0.9255)
##     No Information Rate : 0.8065          
##     P-Value [Acc > NIR] : 0.6069          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: High_risk Class: Low_risk Class: Moderate_risk
## Sensitivity                   0.00000          0.0000               1.0000
## Specificity                   1.00000          1.0000               0.0000
## Pos Pred Value                    NaN             NaN               0.8065
## Neg Pred Value                0.96774          0.8387                  NaN
## Prevalence                    0.03226          0.1613               0.8065
## Detection Rate                0.00000          0.0000               0.8065
## Detection Prevalence          0.00000          0.0000               1.0000
## Balanced Accuracy             0.50000          0.5000               0.5000
RF.Pre <- list()
RF.accuracy <- list()
RF.CM.List <- list()
RF.Pre[[1]] <- predRF
k <- 1

#for(t in (i.train.sample + s + 1):(d[1]-p-s)){
for(t in (i.train.sample + s + 1):(i.train.sample + s + 11)){
  k <- k + 1
  ### Defining a train set
  # Define predictors
  ## For time t - 1
  al_wafa_Total_Posts1 <- riots_new[1:(t-1),1] 
  bahrain_moi_Total_Posts1 <- riots_new[1:(t-1),2] 
  BahrainRights_Favorites1 <- riots_new[1:(t-1),3] 
  BahrainRights_Retweets1 <- riots_new[1:(t-1),4] 
  BBCArabic_Total_Posts1 <- riots_new[1:(t-1),5] 
  BBCArabic_Favorites1 <- riots_new[1:(t-1),6] 
  BBCArabic_Retweets1 <- riots_new[1:(t-1),7] 
  bh14feb2011_Total_Posts1 <- riots_new[1:(t-1),8] 
  bh14feb2011_Favorites1 <- riots_new[1:(t-1),9] 
  bh14feb2011_Retweets1 <- riots_new[1:(t-1),10] 
  bna_ar_Total_Posts1 <- riots_new[1:(t-1),11] 
  bna_ar_Retweets1 <- riots_new[1:(t-1),12] 
  Coalition14_Total_Posts1 <- riots_new[1:(t-1),13] 
  Coalition14_Favorites1 <- riots_new[1:(t-1),14] 
  duraz_youth_Total_Posts1 <- riots_new[1:(t-1),15] 
  duraz_youth_Favorites1 <- riots_new[1:(t-1),16] 
  duraz_youth_Retweets1 <- riots_new[1:(t-1),17] 
  feb14revolution_Retweets1 <- riots_new[1:(t-1),18] 
  GDNonline_Total_Posts1 <- riots_new[1:(t-1),19] 
  GDNonline_Favorites1 <- riots_new[1:(t-1),20] 
  GDNonline_Retweets1 <- riots_new[1:(t-1),21] 
  Iran_Total_Posts1 <- riots_new[1:(t-1),22] 
  Iran_Favorites1 <- riots_new[1:(t-1),23] 
  IranNW_Retweets1 <- riots_new[1:(t-1),24] 
  malarab1_Favorites1 <- riots_new[1:(t-1),25] 
  NABEELRAJAB_Favorites1 <- riots_new[1:(t-1),26] 
  netanyahu_Favorites1 <- riots_new[1:(t-1),27] 
  netanyahu_Retweets1 <- riots_new[1:(t-1),28] 
  rouhani_Total_Posts1 <- riots_new[1:(t-1),29] 
  rouhani_Favorites1 <- riots_new[1:(t-1),30] 
  USEmbassyManama_Retweets1 <- riots_new[1:(t-1),31] 
  TEMP1 <- riots_new[1:(t-1),32] 
  DEWP1 <- riots_new[1:(t-1),33] 
  WDSP1 <- riots_new[1:(t-1),34] 
  PRCP1 <- riots_new[1:(t-1),35] 
  zinc_Open1 <- riots_new[1:(t-1),36] 
  zinc_Low1 <- riots_new[1:(t-1),37] 
  WTI_Close1 <- riots_new[1:(t-1),38] 
  WTI_Low1 <- riots_new[1:(t-1),39] 
  wheat_Open1 <- riots_new[1:(t-1),40] 
  wheat_High1 <- riots_new[1:(t-1),41] 
  wheat_Low1 <- riots_new[1:(t-1),42] 
  tin_Close1 <- riots_new[1:(t-1),43] 
  tin_High1 <- riots_new[1:(t-1),44] 
  tin_Low1 <- riots_new[1:(t-1),45] 
  sugar_Close1 <- riots_new[1:(t-1),46] 
  sugar_Open1 <- riots_new[1:(t-1),47] 
  sugar_Low1 <- riots_new[1:(t-1),48] 
  soybean_Close1 <- riots_new[1:(t-1),49] 
  soybean_Open1 <- riots_new[1:(t-1),50] 
  soybean_High1 <- riots_new[1:(t-1),51] 
  soybean_Low1 <- riots_new[1:(t-1),52] 
  silver_High1 <- riots_new[1:(t-1),53] 
  rice_Close1 <- riots_new[1:(t-1),54] 
  rice_High1 <- riots_new[1:(t-1),55] 
  platinum_Close1 <- riots_new[1:(t-1),56] 
  natural_gas_Close1 <- riots_new[1:(t-1),57] 
  monero_Close1 <- riots_new[1:(t-1),58] 
  monero_High1 <- riots_new[1:(t-1),59] 
  litecoin_Close1 <- riots_new[1:(t-1),60] 
  litecoin_Open1 <- riots_new[1:(t-1),61] 
  litecoin_Low1 <- riots_new[1:(t-1),62] 
  lead_High1 <- riots_new[1:(t-1),63] 
  lead_Low1 <- riots_new[1:(t-1),64] 
  Gold_Low1 <- riots_new[1:(t-1),65] 
  cotton_Low1 <- riots_new[1:(t-1),66] 
  corn_High1 <- riots_new[1:(t-1),67] 
  corn_Low1 <- riots_new[1:(t-1),68] 
  copper_Close1 <- riots_new[1:(t-1),69] 
  copper_High1 <- riots_new[1:(t-1),70] 
  coffee_Close1 <- riots_new[1:(t-1),71] 
  coffee_High1 <- riots_new[1:(t-1),72] 
  live_cattle_Close1 <- riots_new[1:(t-1),73] 
  live_cattle_High1 <- riots_new[1:(t-1),74] 
  live_cattle_Low1 <- riots_new[1:(t-1),75] 
  feed_cattle_Close1 <- riots_new[1:(t-1),76] 
  feed_cattle_Open1 <- riots_new[1:(t-1),77] 
  Brent_Close1 <- riots_new[1:(t-1),78] 
  Brent_Open1 <- riots_new[1:(t-1),79] 
  Bitcoin_Open1 <- riots_new[1:(t-1),80] 
  Bitcoin_Low1 <- riots_new[1:(t-1),81] 
  BAX_Open1 <- riots_new[1:(t-1),82] 
  BAX_High1 <- riots_new[1:(t-1),83] 
  protest_count1 <- riots_new[1:(t-1),84] 
  VAC_count1 <- riots_new[1:(t-1),85] 
  total_fatalities1 <- riots_new[1:(t-1),86] 
  category1 <- riots_new[1:(t-1),87] 
  
  ## For time t
  al_wafa_Total_Posts2 <- riots_new[2:t,1] 
  bahrain_moi_Total_Posts2 <- riots_new[2:t,2] 
  BahrainRights_Favorites2 <- riots_new[2:t,3] 
  BahrainRights_Retweets2 <- riots_new[2:t,4] 
  BBCArabic_Total_Posts2 <- riots_new[2:t,5] 
  BBCArabic_Favorites2 <- riots_new[2:t,6] 
  BBCArabic_Retweets2 <- riots_new[2:t,7] 
  bh14feb2011_Total_Posts2 <- riots_new[2:t,8] 
  bh14feb2011_Favorites2 <- riots_new[2:t,9] 
  bh14feb2011_Retweets2 <- riots_new[2:t,10] 
  bna_ar_Total_Posts2 <- riots_new[2:t,11] 
  bna_ar_Retweets2 <- riots_new[2:t,12] 
  Coalition14_Total_Posts2 <- riots_new[2:t,13] 
  Coalition14_Favorites2 <- riots_new[2:t,14] 
  duraz_youth_Total_Posts2 <- riots_new[2:t,15] 
  duraz_youth_Favorites2 <- riots_new[2:t,16] 
  duraz_youth_Retweets2 <- riots_new[2:t,17] 
  feb14revolution_Retweets2 <- riots_new[2:t,18] 
  GDNonline_Total_Posts2 <- riots_new[2:t,19] 
  GDNonline_Favorites2 <- riots_new[2:t,20] 
  GDNonline_Retweets2 <- riots_new[2:t,21] 
  Iran_Total_Posts2 <- riots_new[2:t,22] 
  Iran_Favorites2 <- riots_new[2:t,23] 
  IranNW_Retweets2 <- riots_new[2:t,24] 
  malarab1_Favorites2 <- riots_new[2:t,25] 
  NABEELRAJAB_Favorites2 <- riots_new[2:t,26] 
  netanyahu_Favorites2 <- riots_new[2:t,27] 
  netanyahu_Retweets2 <- riots_new[2:t,28] 
  rouhani_Total_Posts2 <- riots_new[2:t,29] 
  rouhani_Favorites2 <- riots_new[2:t,30] 
  USEmbassyManama_Retweets2 <- riots_new[2:t,31] 
  TEMP2 <- riots_new[2:t,32] 
  DEWP2 <- riots_new[2:t,33] 
  WDSP2 <- riots_new[2:t,34] 
  PRCP2 <- riots_new[2:t,35] 
  zinc_Open2 <- riots_new[2:t,36] 
  zinc_Low2 <- riots_new[2:t,37] 
  WTI_Close2 <- riots_new[2:t,38] 
  WTI_Low2 <- riots_new[2:t,39] 
  wheat_Open2 <- riots_new[2:t,40] 
  wheat_High2 <- riots_new[2:t,41] 
  wheat_Low2 <- riots_new[2:t,42] 
  tin_Close2 <- riots_new[2:t,43] 
  tin_High2 <- riots_new[2:t,44] 
  tin_Low2 <- riots_new[2:t,45] 
  sugar_Close2 <- riots_new[2:t,46] 
  sugar_Open2 <- riots_new[2:t,47] 
  sugar_Low2 <- riots_new[2:t,48] 
  soybean_Close2 <- riots_new[2:t,49] 
  soybean_Open2 <- riots_new[2:t,50] 
  soybean_High2 <- riots_new[2:t,51] 
  soybean_Low2 <- riots_new[2:t,52] 
  silver_High2 <- riots_new[2:t,53] 
  rice_Close2 <- riots_new[2:t,54] 
  rice_High2 <- riots_new[2:t,55] 
  platinum_Close2 <- riots_new[2:t,56] 
  natural_gas_Close2 <- riots_new[2:t,57] 
  monero_Close2 <- riots_new[2:t,58] 
  monero_High2 <- riots_new[2:t,59] 
  litecoin_Close2 <- riots_new[2:t,60] 
  litecoin_Open2 <- riots_new[2:t,61] 
  litecoin_Low2 <- riots_new[2:t,62] 
  lead_High2 <- riots_new[2:t,63] 
  lead_Low2 <- riots_new[2:t,64] 
  Gold_Low2 <- riots_new[2:t,65] 
  cotton_Low2 <- riots_new[2:t,66] 
  corn_High2 <- riots_new[2:t,67] 
  corn_Low2 <- riots_new[2:t,68] 
  copper_Close2 <- riots_new[2:t,69] 
  copper_High2 <- riots_new[2:t,70] 
  coffee_Close2 <- riots_new[2:t,71] 
  coffee_High2 <- riots_new[2:t,72] 
  live_cattle_Close2 <- riots_new[2:t,73] 
  live_cattle_High2 <- riots_new[2:t,74] 
  live_cattle_Low2 <- riots_new[2:t,75] 
  feed_cattle_Close2 <- riots_new[2:t,76] 
  feed_cattle_Open2 <- riots_new[2:t,77] 
  Brent_Close2 <- riots_new[2:t,78] 
  Brent_Open2 <- riots_new[2:t,79] 
  Bitcoin_Open2 <- riots_new[2:t,80] 
  Bitcoin_Low2 <- riots_new[2:t,81] 
  BAX_Open2 <- riots_new[2:t,82] 
  BAX_High2 <- riots_new[2:t,83] 
  protest_count2 <- riots_new[2:t,84] 
  VAC_count2 <- riots_new[2:t,85] 
  total_fatalities2 <- riots_new[2:t,86] 
  category2 <- riots_new[2:t,87]
  
  # Now define the target
  category3 <- riots_new[(2+s):(t+s), 87]
  
  train <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, protest_count1, VAC_count1, total_fatalities1, category1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, protest_count2, VAC_count2, total_fatalities2, category2, category3)
  
  #summary(train)
  
  ### Defining a test set
  # Define predictors
  ## For time t - 1
  al_wafa_Total_Posts1 <- riots_new[t:(t+p-1),1] 
  bahrain_moi_Total_Posts1 <- riots_new[t:(t+p-1),2] 
  BahrainRights_Favorites1 <- riots_new[t:(t+p-1),3] 
  BahrainRights_Retweets1 <- riots_new[t:(t+p-1),4] 
  BBCArabic_Total_Posts1 <- riots_new[t:(t+p-1),5] 
  BBCArabic_Favorites1 <- riots_new[t:(t+p-1),6] 
  BBCArabic_Retweets1 <- riots_new[t:(t+p-1),7] 
  bh14feb2011_Total_Posts1 <- riots_new[t:(t+p-1),8] 
  bh14feb2011_Favorites1 <- riots_new[t:(t+p-1),9] 
  bh14feb2011_Retweets1 <- riots_new[t:(t+p-1),10] 
  bna_ar_Total_Posts1 <- riots_new[t:(t+p-1),11] 
  bna_ar_Retweets1 <- riots_new[t:(t+p-1),12] 
  Coalition14_Total_Posts1 <- riots_new[t:(t+p-1),13] 
  Coalition14_Favorites1 <- riots_new[t:(t+p-1),14] 
  duraz_youth_Total_Posts1 <- riots_new[t:(t+p-1),15] 
  duraz_youth_Favorites1 <- riots_new[t:(t+p-1),16] 
  duraz_youth_Retweets1 <- riots_new[t:(t+p-1),17] 
  feb14revolution_Retweets1 <- riots_new[t:(t+p-1),18] 
  GDNonline_Total_Posts1 <- riots_new[t:(t+p-1),19] 
  GDNonline_Favorites1 <- riots_new[t:(t+p-1),20] 
  GDNonline_Retweets1 <- riots_new[t:(t+p-1),21] 
  Iran_Total_Posts1 <- riots_new[t:(t+p-1),22] 
  Iran_Favorites1 <- riots_new[t:(t+p-1),23] 
  IranNW_Retweets1 <- riots_new[t:(t+p-1),24] 
  malarab1_Favorites1 <- riots_new[t:(t+p-1),25] 
  NABEELRAJAB_Favorites1 <- riots_new[t:(t+p-1),26] 
  netanyahu_Favorites1 <- riots_new[t:(t+p-1),27] 
  netanyahu_Retweets1 <- riots_new[t:(t+p-1),28] 
  rouhani_Total_Posts1 <- riots_new[t:(t+p-1),29] 
  rouhani_Favorites1 <- riots_new[t:(t+p-1),30] 
  USEmbassyManama_Retweets1 <- riots_new[t:(t+p-1),31] 
  TEMP1 <- riots_new[t:(t+p-1),32] 
  DEWP1 <- riots_new[t:(t+p-1),33] 
  WDSP1 <- riots_new[t:(t+p-1),34] 
  PRCP1 <- riots_new[t:(t+p-1),35] 
  zinc_Open1 <- riots_new[t:(t+p-1),36] 
  zinc_Low1 <- riots_new[t:(t+p-1),37] 
  WTI_Close1 <- riots_new[t:(t+p-1),38] 
  WTI_Low1 <- riots_new[t:(t+p-1),39] 
  wheat_Open1 <- riots_new[t:(t+p-1),40] 
  wheat_High1 <- riots_new[t:(t+p-1),41] 
  wheat_Low1 <- riots_new[t:(t+p-1),42] 
  tin_Close1 <- riots_new[t:(t+p-1),43] 
  tin_High1 <- riots_new[t:(t+p-1),44] 
  tin_Low1 <- riots_new[t:(t+p-1),45] 
  sugar_Close1 <- riots_new[t:(t+p-1),46] 
  sugar_Open1 <- riots_new[t:(t+p-1),47] 
  sugar_Low1 <- riots_new[t:(t+p-1),48] 
  soybean_Close1 <- riots_new[t:(t+p-1),49] 
  soybean_Open1 <- riots_new[t:(t+p-1),50] 
  soybean_High1 <- riots_new[t:(t+p-1),51] 
  soybean_Low1 <- riots_new[t:(t+p-1),52] 
  silver_High1 <- riots_new[t:(t+p-1),53] 
  rice_Close1 <- riots_new[t:(t+p-1),54] 
  rice_High1 <- riots_new[t:(t+p-1),55] 
  platinum_Close1 <- riots_new[t:(t+p-1),56] 
  natural_gas_Close1 <- riots_new[t:(t+p-1),57] 
  monero_Close1 <- riots_new[t:(t+p-1),58] 
  monero_High1 <- riots_new[t:(t+p-1),59] 
  litecoin_Close1 <- riots_new[t:(t+p-1),60] 
  litecoin_Open1 <- riots_new[t:(t+p-1),61] 
  litecoin_Low1 <- riots_new[t:(t+p-1),62] 
  lead_High1 <- riots_new[t:(t+p-1),63] 
  lead_Low1 <- riots_new[t:(t+p-1),64] 
  Gold_Low1 <- riots_new[t:(t+p-1),65] 
  cotton_Low1 <- riots_new[t:(t+p-1),66] 
  corn_High1 <- riots_new[t:(t+p-1),67] 
  corn_Low1 <- riots_new[t:(t+p-1),68] 
  copper_Close1 <- riots_new[t:(t+p-1),69] 
  copper_High1 <- riots_new[t:(t+p-1),70] 
  coffee_Close1 <- riots_new[t:(t+p-1),71] 
  coffee_High1 <- riots_new[t:(t+p-1),72] 
  live_cattle_Close1 <- riots_new[t:(t+p-1),73] 
  live_cattle_High1 <- riots_new[t:(t+p-1),74] 
  live_cattle_Low1 <- riots_new[t:(t+p-1),75] 
  feed_cattle_Close1 <- riots_new[t:(t+p-1),76] 
  feed_cattle_Open1 <- riots_new[t:(t+p-1),77] 
  Brent_Close1 <- riots_new[t:(t+p-1),78] 
  Brent_Open1 <- riots_new[t:(t+p-1),79] 
  Bitcoin_Open1 <- riots_new[t:(t+p-1),80] 
  Bitcoin_Low1 <- riots_new[t:(t+p-1),81] 
  BAX_Open1 <- riots_new[t:(t+p-1),82] 
  BAX_High1 <- riots_new[t:(t+p-1),83] 
  protest_count1 <- riots_new[t:(t+p-1),84] 
  VAC_count1 <- riots_new[t:(t+p-1),85] 
  total_fatalities1 <- riots_new[t:(t+p-1),86] 
  category1 <- riots_new[t:(t+p-1),87]
  
  ## For time t
  al_wafa_Total_Posts2 <- riots_new[(t+1):(t+p),1] 
  bahrain_moi_Total_Posts2 <- riots_new[(t+1):(t+p),2] 
  BahrainRights_Favorites2 <- riots_new[(t+1):(t+p),3] 
  BahrainRights_Retweets2 <- riots_new[(t+1):(t+p),4] 
  BBCArabic_Total_Posts2 <- riots_new[(t+1):(t+p),5] 
  BBCArabic_Favorites2 <- riots_new[(t+1):(t+p),6] 
  BBCArabic_Retweets2 <- riots_new[(t+1):(t+p),7] 
  bh14feb2011_Total_Posts2 <- riots_new[(t+1):(t+p),8] 
  bh14feb2011_Favorites2 <- riots_new[(t+1):(t+p),9] 
  bh14feb2011_Retweets2 <- riots_new[(t+1):(t+p),10] 
  bna_ar_Total_Posts2 <- riots_new[(t+1):(t+p),11] 
  bna_ar_Retweets2 <- riots_new[(t+1):(t+p),12] 
  Coalition14_Total_Posts2 <- riots_new[(t+1):(t+p),13] 
  Coalition14_Favorites2 <- riots_new[(t+1):(t+p),14] 
  duraz_youth_Total_Posts2 <- riots_new[(t+1):(t+p),15] 
  duraz_youth_Favorites2 <- riots_new[(t+1):(t+p),16] 
  duraz_youth_Retweets2 <- riots_new[(t+1):(t+p),17] 
  feb14revolution_Retweets2 <- riots_new[(t+1):(t+p),18] 
  GDNonline_Total_Posts2 <- riots_new[(t+1):(t+p),19] 
  GDNonline_Favorites2 <- riots_new[(t+1):(t+p),20] 
  GDNonline_Retweets2 <- riots_new[(t+1):(t+p),21] 
  Iran_Total_Posts2 <- riots_new[(t+1):(t+p),22] 
  Iran_Favorites2 <- riots_new[(t+1):(t+p),23] 
  IranNW_Retweets2 <- riots_new[(t+1):(t+p),24] 
  malarab1_Favorites2 <- riots_new[(t+1):(t+p),25] 
  NABEELRAJAB_Favorites2 <- riots_new[(t+1):(t+p),26] 
  netanyahu_Favorites2 <- riots_new[(t+1):(t+p),27] 
  netanyahu_Retweets2 <- riots_new[(t+1):(t+p),28] 
  rouhani_Total_Posts2 <- riots_new[(t+1):(t+p),29] 
  rouhani_Favorites2 <- riots_new[(t+1):(t+p),30] 
  USEmbassyManama_Retweets2 <- riots_new[(t+1):(t+p),31] 
  TEMP2 <- riots_new[(t+1):(t+p),32] 
  DEWP2 <- riots_new[(t+1):(t+p),33] 
  WDSP2 <- riots_new[(t+1):(t+p),34] 
  PRCP2 <- riots_new[(t+1):(t+p),35] 
  zinc_Open2 <- riots_new[(t+1):(t+p),36] 
  zinc_Low2 <- riots_new[(t+1):(t+p),37] 
  WTI_Close2 <- riots_new[(t+1):(t+p),38] 
  WTI_Low2 <- riots_new[(t+1):(t+p),39] 
  wheat_Open2 <- riots_new[(t+1):(t+p),40] 
  wheat_High2 <- riots_new[(t+1):(t+p),41] 
  wheat_Low2 <- riots_new[(t+1):(t+p),42] 
  tin_Close2 <- riots_new[(t+1):(t+p),43] 
  tin_High2 <- riots_new[(t+1):(t+p),44] 
  tin_Low2 <- riots_new[(t+1):(t+p),45] 
  sugar_Close2 <- riots_new[(t+1):(t+p),46] 
  sugar_Open2 <- riots_new[(t+1):(t+p),47] 
  sugar_Low2 <- riots_new[(t+1):(t+p),48] 
  soybean_Close2 <- riots_new[(t+1):(t+p),49] 
  soybean_Open2 <- riots_new[(t+1):(t+p),50] 
  soybean_High2 <- riots_new[(t+1):(t+p),51] 
  soybean_Low2 <- riots_new[(t+1):(t+p),52] 
  silver_High2 <- riots_new[(t+1):(t+p),53] 
  rice_Close2 <- riots_new[(t+1):(t+p),54] 
  rice_High2 <- riots_new[(t+1):(t+p),55] 
  platinum_Close2 <- riots_new[(t+1):(t+p),56] 
  natural_gas_Close2 <- riots_new[(t+1):(t+p),57] 
  monero_Close2 <- riots_new[(t+1):(t+p),58] 
  monero_High2 <- riots_new[(t+1):(t+p),59] 
  litecoin_Close2 <- riots_new[(t+1):(t+p),60] 
  litecoin_Open2 <- riots_new[(t+1):(t+p),61] 
  litecoin_Low2 <- riots_new[(t+1):(t+p),62] 
  lead_High2 <- riots_new[(t+1):(t+p),63] 
  lead_Low2 <- riots_new[(t+1):(t+p),64] 
  Gold_Low2 <- riots_new[(t+1):(t+p),65] 
  cotton_Low2 <- riots_new[(t+1):(t+p),66] 
  corn_High2 <- riots_new[(t+1):(t+p),67] 
  corn_Low2 <- riots_new[(t+1):(t+p),68] 
  copper_Close2 <- riots_new[(t+1):(t+p),69] 
  copper_High2 <- riots_new[(t+1):(t+p),70] 
  coffee_Close2 <- riots_new[(t+1):(t+p),71] 
  coffee_High2 <- riots_new[(t+1):(t+p),72] 
  live_cattle_Close2 <- riots_new[(t+1):(t+p),73] 
  live_cattle_High2 <- riots_new[(t+1):(t+p),74] 
  live_cattle_Low2 <- riots_new[(t+1):(t+p),75] 
  feed_cattle_Close2 <- riots_new[(t+1):(t+p),76] 
  feed_cattle_Open2 <- riots_new[(t+1):(t+p),77] 
  Brent_Close2 <- riots_new[(t+1):(t+p),78] 
  Brent_Open2 <- riots_new[(t+1):(t+p),79] 
  Bitcoin_Open2 <- riots_new[(t+1):(t+p),80] 
  Bitcoin_Low2 <- riots_new[(t+1):(t+p),81] 
  BAX_Open2 <- riots_new[(t+1):(t+p),82] 
  BAX_High2 <- riots_new[(t+1):(t+p),83] 
  protest_count2 <- riots_new[(t+1):(t+p),84] 
  VAC_count2 <- riots_new[(t+1):(t+p),85] 
  total_fatalities2 <- riots_new[(t+1):(t+p),86] 
  category2 <- riots_new[(t+1):(t+p),87]
  
  # Now define the target
  category3 <- riots_new[(t+1+s):(t+p+s), 87]
  
  test <- data.frame(al_wafa_Total_Posts1, bahrain_moi_Total_Posts1, BahrainRights_Favorites1, BahrainRights_Retweets1, BBCArabic_Total_Posts1, BBCArabic_Favorites1, BBCArabic_Retweets1, bh14feb2011_Total_Posts1, bh14feb2011_Favorites1, bh14feb2011_Retweets1, bna_ar_Total_Posts1, bna_ar_Retweets1, Coalition14_Total_Posts1, Coalition14_Favorites1, duraz_youth_Total_Posts1, duraz_youth_Favorites1, duraz_youth_Retweets1, feb14revolution_Retweets1, GDNonline_Total_Posts1, GDNonline_Favorites1, GDNonline_Retweets1, Iran_Total_Posts1, Iran_Favorites1, IranNW_Retweets1, malarab1_Favorites1, NABEELRAJAB_Favorites1, netanyahu_Favorites1, netanyahu_Retweets1, rouhani_Total_Posts1, rouhani_Favorites1, USEmbassyManama_Retweets1, TEMP1, DEWP1, WDSP1, PRCP1, zinc_Open1, zinc_Low1, WTI_Close1, WTI_Low1, wheat_Open1, wheat_High1, wheat_Low1, tin_Close1, tin_High1, tin_Low1, sugar_Close1, sugar_Open1, sugar_Low1, soybean_Close1, soybean_Open1, soybean_High1, soybean_Low1, silver_High1, rice_Close1, rice_High1, platinum_Close1, natural_gas_Close1, monero_Close1, monero_High1, litecoin_Close1, litecoin_Open1, litecoin_Low1, lead_High1, lead_Low1, Gold_Low1, cotton_Low1, corn_High1, corn_Low1, copper_Close1, copper_High1, coffee_Close1, coffee_High1, live_cattle_Close1, live_cattle_High1, live_cattle_Low1, feed_cattle_Close1, feed_cattle_Open1, Brent_Close1, Brent_Open1, Bitcoin_Open1, Bitcoin_Low1, BAX_Open1, BAX_High1, protest_count1, VAC_count1, total_fatalities1, category1, al_wafa_Total_Posts2, bahrain_moi_Total_Posts2, BahrainRights_Favorites2, BahrainRights_Retweets2, BBCArabic_Total_Posts2, BBCArabic_Favorites2, BBCArabic_Retweets2, bh14feb2011_Total_Posts2, bh14feb2011_Favorites2, bh14feb2011_Retweets2, bna_ar_Total_Posts2, bna_ar_Retweets2, Coalition14_Total_Posts2, Coalition14_Favorites2, duraz_youth_Total_Posts2, duraz_youth_Favorites2, duraz_youth_Retweets2, feb14revolution_Retweets2, GDNonline_Total_Posts2, GDNonline_Favorites2, GDNonline_Retweets2, Iran_Total_Posts2, Iran_Favorites2, IranNW_Retweets2, malarab1_Favorites2, NABEELRAJAB_Favorites2, netanyahu_Favorites2, netanyahu_Retweets2, rouhani_Total_Posts2, rouhani_Favorites2, USEmbassyManama_Retweets2, TEMP2, DEWP2, WDSP2, PRCP2, zinc_Open2, zinc_Low2, WTI_Close2, WTI_Low2, wheat_Open2, wheat_High2, wheat_Low2, tin_Close2, tin_High2, tin_Low2, sugar_Close2, sugar_Open2, sugar_Low2, soybean_Close2, soybean_Open2, soybean_High2, soybean_Low2, silver_High2, rice_Close2, rice_High2, platinum_Close2, natural_gas_Close2, monero_Close2, monero_High2, litecoin_Close2, litecoin_Open2, litecoin_Low2, lead_High2, lead_Low2, Gold_Low2, cotton_Low2, corn_High2, corn_Low2, copper_Close2, copper_High2, coffee_Close2, coffee_High2, live_cattle_Close2, live_cattle_High2, live_cattle_Low2, feed_cattle_Close2, feed_cattle_Open2, Brent_Close2, Brent_Open2, Bitcoin_Open2, Bitcoin_Low2, BAX_Open2, BAX_High2, protest_count2, VAC_count2, total_fatalities2, category2, category3)

  ## Applying RF
  RandomForest <- randomForest(category3 ~ ., data=train, importance = TRUE, ntrees = nrow(train), weights = 1000/table(train$category))
  predRF <- predict(RandomForest, newdata = test, type = "response")
      
  RF.Pre[[k]] <- predRF
  confusion_matrix <- confusionMatrix(predClassRF, test$category3)
  
  RF.accuracy[[k]] <- confusion_matrix$overall[1]
  RF.CM.List[[k]] <- multi_class_rates(as.matrix(confusion_matrix))
}
# Create variables to calculate the mean of each element inside the dataframe 
# True positives
TP.HR <- list()
TP.LR <- list()
TP.MR <- list()

# False positives
FP.HR <- list()
FP.LR <- list()
FP.MR <- list()

# True negatives
TN.LR <- list()
TN.MR <- list()
TN.HR <- list()

# False negatives
FN.LR <- list()
FN.MR <- list()
FN.HR <- list()

i <- 1 # Initialize a counter
while(i <= length(RF.CM.List)){ # Iterate over every dataframe
  TP.HR[[i]] <- RF.CM.List[[i]][[1,1]]
  TP.LR[[i]] <- RF.CM.List[[i]][[2,1]]
  TP.MR[[i]] <- RF.CM.List[[i]][[3,1]]
  
  # False positives
  FP.HR[[i]] <- RF.CM.List[[i]][[1,2]]
  FP.LR[[i]] <- RF.CM.List[[i]][[2,2]]
  FP.MR[[i]] <- RF.CM.List[[i]][[3,2]]
  
  # True negatives
  TN.HR[[i]] <- RF.CM.List[[i]][[1,3]]
  TN.LR[[i]] <- RF.CM.List[[i]][[2,3]]
  TN.MR[[i]] <- RF.CM.List[[i]][[3,3]]
  
  # False negatives
  FN.HR[[i]] <- RF.CM.List[[i]][[1,4]] 
  FN.LR[[i]] <- RF.CM.List[[i]][[2,4]]
  FN.MR[[i]] <- RF.CM.List[[i]][[3,4]]
  
  i = i + 1 # Increment the counter
}

# Remove null elements from list
TP.HR <- TP.HR[!sapply(TP.HR, is.null)]
TP.LR <- TP.LR[!sapply(TP.LR, is.null)]
TP.MR <- TP.MR[!sapply(TP.MR, is.null)]

FP.HR <- FP.HR[!sapply(FP.HR, is.null)]
FP.LR <- FP.LR[!sapply(FP.LR, is.null)]
FP.MR <- FP.MR[!sapply(FP.MR, is.null)]

TN.HR <- TN.HR[!sapply(TN.HR, is.null)]
TN.LR <- TN.LR[!sapply(TN.LR, is.null)]
TN.MR <- TN.MR[!sapply(TN.MR, is.null)]

FN.HR <- FN.HR[!sapply(FN.HR, is.null)]
FN.LR <- FN.LR[!sapply(FN.LR, is.null)]
FN.MR <- FN.MR[!sapply(FN.MR, is.null)]
# Create a matrix that averages true positives, true negatives, false positives, and false negatives
final_matrix <-matrix(data = NA, nrow = 3, ncol = 4)
colnames(final_matrix) <- c("True Positive", "False Positive", "True Negative", "False Negative")
rownames(final_matrix) <- c("High Risk", "Low Risk", "Moderate Risk")
final_matrix[1,1] <- mean(sapply(TP.HR, mean))
final_matrix[2,1] <- mean(sapply(TP.LR, mean))
final_matrix[3,1] <- mean(sapply(TP.MR, mean))
final_matrix[1,2] <- mean(sapply(FP.HR, mean))
final_matrix[2,2] <- mean(sapply(FP.LR, mean))
final_matrix[3,2] <- mean(sapply(FP.MR, mean))
final_matrix[1,3] <- mean(sapply(TN.HR, mean))
final_matrix[2,3] <- mean(sapply(TN.LR, mean))
final_matrix[3,3] <- mean(sapply(TN.MR, mean))
final_matrix[1,4] <- mean(sapply(FN.HR, mean))
final_matrix[2,4] <- mean(sapply(FN.LR, mean))
final_matrix[3,4] <- mean(sapply(FN.MR, mean))

# Call to the final matrix
final_matrix
##               True Positive False Positive True Negative False Negative
## High Risk           0.00000       1.000000      30.00000       0.000000
## Low Risk            0.00000       6.272727      24.72727       0.000000
## Moderate Risk      23.72727       0.000000       0.00000       7.272727
# Calculate ratios for true positive, true negative, false positive, and false negative
TPR <- (final_matrix[1,1] + final_matrix[2,1] + final_matrix[3,1])/(final_matrix[1,1] + final_matrix[2,1] + final_matrix[3,1] + final_matrix[1,2] + final_matrix[2,2] + final_matrix[3,2])

TNR <- (final_matrix[1,3] + final_matrix[2,3] + final_matrix[3,3])/(final_matrix[1,3] + final_matrix[2,3] + final_matrix[3,3] + final_matrix[1,4] + final_matrix[2,4] + final_matrix[3,4])

FPR <- (final_matrix[1,2] + final_matrix[2,2] + final_matrix[3,2])/(final_matrix[1,1] + final_matrix[2,1] + final_matrix[3,1] + final_matrix[1,2] + final_matrix[2,2] + final_matrix[3,2])
  
FNR <- (final_matrix[1,4] + final_matrix[2,4] + final_matrix[3,4])/(final_matrix[1,3] + final_matrix[2,3] + final_matrix[3,3] + final_matrix[1,4] + final_matrix[2,4] + final_matrix[3,4])
## [1] "The overall accuracy of the model is 76.54%."
## [1] "The overall false positive rate of the model is 23.46%."
## [1] "The overall false negative rate of the model is 11.73%."
## [1] "The overall true negative rate of the model is 88.27%."
## [1] "The overall true positive rate of the model is 76.54%."

Questions

Cons: o Arguably, the most significant con of our approach is the loss of granularity. We are no longer able to predict a number of events whether it’s the reduced dimension PCA value, riot count, or the total number of events. o With any RandomForest model, we are unable to accurately understand the model’s characteristics due to RandomForests being “Black Box” models. o For our particular dataset, our predictors may not have ample predictive power. When analyzing ACLED’s data collection techniques, we can see that their Bahraini data overwhelmingly comes from social media and in particular Twitter. As a result, one would think that predicting events using this data would’ve yielded better results. o Unknown to us, our prediction trees may be correlated, which is a byproduct of RandomForest.

Limitations: o We can see from the univariate time series data above that the data is on an apparent downward trend, which may be causal. As a result, trying to accurately predict the number of events may be difficult if a lower number of events is the new normal. o The dataset somewhat accurately represents the ACLED target variables. Despite ACLED drawing its data directly from Twitter, our dataset contains a significant amount of noise. o Time series data are difficult to accurately predict.

Way Forward

As previously noted, raw data from across 191 PMESII-PT variables were collected; however, additional data is necessary to increase overall accuracy of the model. Accounting for nearly 98% of all internet searches in Bahrain, Google dominates as the number one internet search engine. Therefore, incorporating daily search engine data from Google may be worthwhile. Unfortunately, Google Trends provides only a relative usage value and not a raw number of user searches. In the event Google decides to release its raw data or an agreement can be reached to incorporate Google’s raw data, it may prove to be of value as well. Daily arrest statistics may be have an impact since the arrest of any key opposition members will inherently be a point of contention amongst low-level leaders and their followers. Since opposition members primarily utilize mobile phones to communicate and coordinate protests, daily cell phone data usage rates may prove to be statistically significant. Searching and accounting for key terms within articles, both Arabic and English, is possible with the advent of simple coding software and can be incorporated into each model.

As evidenced in our predictor selection, social media plays a predominant role in each of these models. Social media will only continue to rise as users continue to create profiles, develop original content, and facilitate social networking. The rise of the Arabic language is increasingly dominating social interactions online as these mediums are becoming the de facto location for people and organizations to connect and interact in Bahrain and throughout the Middle East. As evidenced with the Black Lives Matter movement in the United States, social media allows individuals to organize, mobilize, and react almost instantaneously to polarizing events. Bahrain is no exception to this as evidenced with the Arab Spring in 2011 and the opposition’s usage of social media today. However, access and utilization of social media is no longer limited to youth as older generations are increasing their usage as well. Social media plays an increasing vital role in the daily lives of billions around the world.

The models mentioned in this project merely demonstrates a prototype capability that can be modified, improved, and replicated across various countries. For example, famine and unrest in Yemen or refugee migration in Syria may be predicted utilizing similar methods. Furthermore, if this concept is applied to other countries within the Middle East, a cumulative model may be developed to predict the next Arab Spring. In addition to application at the country level, this concept can be applied to districts in an attempt to model behavior in a particular area. Applying these concepts at the micro and macro level to regions across the world allows a supported command to coordinate with host nation partners to affect and mitigate contributing factors.

Conclusion

By collecting civil information through data mining, one is able to develop a holistic understanding of the operational environment. Moreover, this holistic understanding of the civil-sector components increases a command’s ability to affect events within an operational environment. Despite the randomness associated with human behavior, the team demonstrated the ability to model daily protests given data surrounding the operational environment. By analyzing seasonality within the data, the team identified that there may be an intentional or coincidental cycle of seven to ten days. Identifying this pattern will enable effective decision making for appropriate force protection measures. The team’s analysis of high-risk areas yielded sufficient evidence that the base is not under imminent threat but the situation should be monitored. The team analyzed the relationship between violent demonstrations and non-violent protests and concluded that the two are independent and identically distributed. The significance of this cannot be understated as it established two distinct groups: peaceful demonstrators and agitators. These two actors can be addressed separately to resolve their grievances. The team also analyzed the relationship between violent actions between Bahraini forces and its populace. Although the team did not immediately identify a relationship, further analysis should be done to determine if a relationship exists between violence against civilians and future violent demonstrations or other event types.

These models could benefit from additional data sources such as Google search history, social media, and economic data to improve its accuracy. Additional historical protest data would allow the analyst team to understand seasonality and trends from year to year. Supplementary modeling techniques such as segmented or Poisson regression may also prove fruitful for future efforts. Data analytics allows for accurate modeling and prediction of not just violent protests in Bahrain but the methodology discussed in this project could be adapted and applied to nearly any country within the Middle East. A 60 to 70% certainty could prove instrumental for military commanders who are utilizing military intelligence as their sole source of information and are merely speculating as to the enemy’s strategies.